How many reads per second per region server are you throwing at the system - also is 100ms the average latency ?
On Mon, Oct 7, 2013 at 2:04 PM, lars hofhansl <[email protected]> wrote: > He still should not see 100ms latency. 20ms, sure. 100ms seems large; > there are still 8 machines serving the requests. > > I agree this spec is far from optimal, but there is still something odd > here. > > > Ramu, this does not look like a GC issue. You'd see much larger (worst > case) latencies if that were the case (dozens of seconds). > Are you using 40 client from 40 different machines? Or from 40 different > processes on the same machine? Or 40 threads in the same process? > > Thanks. > > -- Lars > > > > ________________________________ > From: Vladimir Rodionov <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Monday, October 7, 2013 11:02 AM > Subject: RE: HBase Random Read latency > 100ms > > > Ramu, your HBase configuration (128GB of heap) is far from optimal. > Nobody runs HBase with that amount of heap to my best knowledge. > 32GB of RAM is the usual upper limit. We run 8-12GB in production. > > What else, your IO capacity is VERY low. 2 SATA drives in RAID 1 for > mostly random reads load? > You should have 8, better 12-16 drives per server. Forget about RAID. You > have HDFS. > > Block cache in your case does not help much , as since your read > amplification is at least x20 (16KB block and 724 B read) - its just waste > RAM (heap). In your case you do not need LARGE heap and LARGE block cache. > > I advise you reconsidering your hardware spec, applying all optimizations > mentioned already in this thread and lowering your expectations. > > With a right hardware you will be able to get 500-1000 truly random reads > per server. > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: [email protected] > > ________________________________________ > > From: Ramu M S [[email protected]] > Sent: Monday, October 07, 2013 5:23 AM > To: [email protected] > Subject: Re: HBase Random Read latency > 100ms > > Hi Bharath, > > I am little confused about the metrics displayed by Cloudera. Even when > there are no oeprations, the gc_time metric is showing 2s constant in the > graph. Is this the CMS gc_time (in that case no JVm pause) or the GC pause. > > GC timings reported earlier is the average taken for gc_time metric for all > region servers. > > Regards, > Ramu > > > On Mon, Oct 7, 2013 at 9:10 PM, Ramu M S <[email protected]> wrote: > > > Jean, > > > > Yes. It is 2 drives. > > > > - Ramu > > > > > > On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari < > > [email protected]> wrote: > > > >> Quick questionon the disk side. > >> > >> When you say: > >> 800 GB SATA (7200 RPM) Disk > >> Is it 1x800GB? It's raid 1, so might be 2 drives? What's the > >> configuration? > >> > >> JM > >> > >> > >> 2013/10/7 Ramu M S <[email protected]> > >> > >> > Lars, Bharath, > >> > > >> > Compression is disabled for the table. This was not intended from the > >> > evaluation. > >> > I forgot to mention that during table creation. I will enable snappy > >> and do > >> > major compaction again. > >> > > >> > Please suggest other options to try out and also suggestions for the > >> > previous questions. > >> > > >> > Thanks, > >> > Ramu > >> > > >> > > >> > On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S <[email protected]> > wrote: > >> > > >> > > Bharath, > >> > > > >> > > I was about to report this. Yes indeed there is too much of GC time. > >> > > Just verified the GC time using Cloudera Manager statistics(Every > >> minute > >> > > update). > >> > > > >> > > For each Region Server, > >> > > - During Read: Graph shows 2s constant. > >> > > - During Compaction: Graph starts with 7s and goes as high as 20s > >> during > >> > > end. > >> > > > >> > > Few more questions, > >> > > 1. For the current evaluation, since the reads are completely random > >> and > >> > I > >> > > don't expect to read same data again can I set the Heap to the > >> default 1 > >> > GB > >> > > ? > >> > > > >> > > 2. Can I completely turn off BLOCK CACHE for this table? > >> > > http://hbase.apache.org/book/regionserver.arch.html recommends > >> that > >> > > for Randm reads. > >> > > > >> > > 3. But in the next phase of evaluation, We are interested to use > >> HBase as > >> > > In-memory KV DB by having the latest data in RAM (To the tune of > >> around > >> > 128 > >> > > GB in each RS, we are setting up 50-100 Node Cluster). I am very > >> curious > >> > to > >> > > hear any suggestions in this regard. > >> > > > >> > > Regards, > >> > > Ramu > >> > > > >> > > > >> > > On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada < > >> > > [email protected]> wrote: > >> > > > >> > >> Hi Ramu, > >> > >> > >> > >> Thanks for reporting the results back. Just curious if you are > >> hitting > >> > any > >> > >> big GC pauses due to block cache churn on such large heap. Do you > see > >> > it ? > >> > >> > >> > >> - Bharath > >> > >> > >> > >> > >> > >> On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S <[email protected]> > >> wrote: > >> > >> > >> > >> > Lars, > >> > >> > > >> > >> > After changing the BLOCKSIZE to 16KB, the latency has reduced a > >> > little. > >> > >> Now > >> > >> > the average is around 75ms. > >> > >> > Overall throughput (I am using 40 Clients to fetch records) is > >> around > >> > 1K > >> > >> > OPS. > >> > >> > > >> > >> > After compaction hdfsBlocksLocalityIndex is > >> 91,88,78,90,99,82,94,97 in > >> > >> my 8 > >> > >> > RS respectively. > >> > >> > > >> > >> > Thanks, > >> > >> > Ramu > >> > >> > > >> > >> > > >> > >> > On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S <[email protected]> > >> > wrote: > >> > >> > > >> > >> > > Thanks Lars. > >> > >> > > > >> > >> > > I have changed the BLOCKSIZE to 16KB and triggered a major > >> > >> compaction. I > >> > >> > > will report my results once it is done. > >> > >> > > > >> > >> > > - Ramu > >> > >> > > > >> > >> > > > >> > >> > > On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl < > [email protected]> > >> > >> wrote: > >> > >> > > > >> > >> > >> First of: 128gb heap per RegionServer. Wow.I'd be interested > to > >> > hear > >> > >> > your > >> > >> > >> experience with such a large heap for your RS. It's definitely > >> big > >> > >> > enough. > >> > >> > >> > >> > >> > >> > >> > >> > >> It's interesting hat 100gb do fit into the aggregate cache (of > >> > >> 8x32gb), > >> > >> > >> while 1.8tb do not. > >> > >> > >> Looks like ~70% of the read request would need to bring in a > >> 64kb > >> > >> block > >> > >> > >> in order to read 724 bytes. > >> > >> > >> > >> > >> > >> Should that take 100ms? No. Something's still amiss. > >> > >> > >> > >> > >> > >> Smaller blocks might help (you'd need to bring in 4, 8, or > maybe > >> > 16k > >> > >> to > >> > >> > >> read the small row). You would need to issue a major > compaction > >> for > >> > >> > that to > >> > >> > >> take effect. > >> > >> > >> Maybe try 16k blocks. If that speeds up your random gets we > know > >> > >> where > >> > >> > to > >> > >> > >> look next... At the disk IO. > >> > >> > >> > >> > >> > >> > >> > >> > >> -- Lars > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> ________________________________ > >> > >> > >> From: Ramu M S <[email protected]> > >> > >> > >> To: [email protected]; lars hofhansl <[email protected]> > >> > >> > >> Sent: Sunday, October 6, 2013 11:05 PM > >> > >> > >> Subject: Re: HBase Random Read latency > 100ms > >> > >> > >> > >> > >> > >> > >> > >> > >> Lars, > >> > >> > >> > >> > >> > >> In one of your old posts, you had mentioned that lowering the > >> > >> BLOCKSIZE > >> > >> > is > >> > >> > >> good for random reads (of course with increased size for Block > >> > >> Indexes). > >> > >> > >> > >> > >> > >> Post is at > >> > >> > http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow > >> > >> > >> > >> > >> > >> Will that help in my tests? Should I give it a try? If I alter > >> my > >> > >> table, > >> > >> > >> should I trigger a major compaction again for this to take > >> effect? > >> > >> > >> > >> > >> > >> Thanks, > >> > >> > >> Ramu > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S < > [email protected]> > >> > >> wrote: > >> > >> > >> > >> > >> > >> > Sorry BLOCKSIZE was wrong in my earlier post, it is the > >> default > >> > 64 > >> > >> KB. > >> > >> > >> > > >> > >> > >> > {NAME => 'usertable', FAMILIES => [{NAME => 'cf', > >> > >> DATA_BLOCK_ENCODING > >> > >> > => > >> > >> > >> > 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', > >> > >> VERSIONS => > >> > >> > >> '1', > >> > >> > >> > COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => > >> '2147483647', > >> > >> > >> > KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', > >> IN_MEMORY => > >> > >> > >> 'false', > >> > >> > >> > ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]} > >> > >> > >> > > >> > >> > >> > Thanks, > >> > >> > >> > Ramu > >> > >> > >> > > >> > >> > >> > > >> > >> > >> > On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S < > >> [email protected]> > >> > >> > wrote: > >> > >> > >> > > >> > >> > >> >> Lars, > >> > >> > >> >> > >> > >> > >> >> - Yes Short Circuit reading is enabled on both HDFS and > >> HBase. > >> > >> > >> >> - I had issued Major compaction after table is loaded. > >> > >> > >> >> - Region Servers have max heap set as 128 GB. Block Cache > >> Size > >> > is > >> > >> > 0.25 > >> > >> > >> of > >> > >> > >> >> heap (So 32 GB for each Region Server) Do we need even > more? > >> > >> > >> >> - Decreasing HFile Size (Default is 1GB )? Should I leave > it > >> to > >> > >> > >> default? > >> > >> > >> >> - Keys are Zipfian distributed (By YCSB) > >> > >> > >> >> > >> > >> > >> >> Bharath, > >> > >> > >> >> > >> > >> > >> >> Bloom Filters are enabled. Here is my table details, > >> > >> > >> >> {NAME => 'usertable', FAMILIES => [{NAME => 'cf', > >> > >> DATA_BLOCK_ENCODING > >> > >> > >> => > >> > >> > >> >> 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', > >> > >> VERSIONS > >> > >> > => > >> > >> > >> '1', > >> > >> > >> >> COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => > >> '2147483647 > >> > ', > >> > >> > >> >> KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '16384', > >> IN_MEMORY > >> > => > >> > >> > >> 'false', > >> > >> > >> >> ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]} > >> > >> > >> >> > >> > >> > >> >> When the data size is around 100GB (100 Million records), > >> then > >> > the > >> > >> > >> >> latency is very good. I am getting a throughput of around > >> 300K > >> > >> OPS. > >> > >> > >> >> In both cases (100 GB and 1.8 TB) Ganglia stats show that > >> Disk > >> > >> reads > >> > >> > >> are > >> > >> > >> >> around 50-60 MB/s throughout the read cycle. > >> > >> > >> >> > >> > >> > >> >> Thanks, > >> > >> > >> >> Ramu > >> > >> > >> >> > >> > >> > >> >> > >> > >> > >> >> On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl < > >> [email protected] > >> > > > >> > >> > >> wrote: > >> > >> > >> >> > >> > >> > >> >>> Have you enabled short circuit reading? See here: > >> > >> > >> >>> http://hbase.apache.org/book/perf.hdfs.html > >> > >> > >> >>> > >> > >> > >> >>> How's your data locality (shown on the RegionServer UI > >> page). > >> > >> > >> >>> > >> > >> > >> >>> > >> > >> > >> >>> How much memory are you giving your RegionServers? > >> > >> > >> >>> If you reads are truly random and the data set does not > fit > >> > into > >> > >> the > >> > >> > >> >>> aggregate cache, you'll be dominated by the disk and > >> network. > >> > >> > >> >>> Each read would need to bring in a 64k (default) HFile > >> block. > >> > If > >> > >> > short > >> > >> > >> >>> circuit reading is not enabled you'll get two or three > >> context > >> > >> > >> switches. > >> > >> > >> >>> > >> > >> > >> >>> So I would try: > >> > >> > >> >>> 1. Enable short circuit reading > >> > >> > >> >>> 2. Increase the block cache size per RegionServer > >> > >> > >> >>> 3. Decrease the HFile block size > >> > >> > >> >>> 4. Make sure your data is local (if it is not, issue a > major > >> > >> > >> compaction). > >> > >> > >> >>> > >> > >> > >> >>> > >> > >> > >> >>> -- Lars > >> > >> > >> >>> > >> > >> > >> >>> > >> > >> > >> >>> > >> > >> > >> >>> ________________________________ > >> > >> > >> >>> From: Ramu M S <[email protected]> > >> > >> > >> >>> To: [email protected] > >> > >> > >> >>> Sent: Sunday, October 6, 2013 10:01 PM > >> > >> > >> >>> Subject: HBase Random Read latency > 100ms > >> > >> > >> >>> > >> > >> > >> >>> > >> > >> > >> >>> Hi All, > >> > >> > >> >>> > >> > >> > >> >>> My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase > >> > 0.94.6). > >> > >> > >> >>> > >> > >> > >> >>> Each Region Server is with the following configuration, > >> > >> > >> >>> 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk > >> > >> > >> >>> (Unfortunately configured with RAID 1, can't change this > as > >> the > >> > >> > >> Machines > >> > >> > >> >>> are leased temporarily for a month). > >> > >> > >> >>> > >> > >> > >> >>> I am running YCSB benchmark tests on HBase and currently > >> > >> inserting > >> > >> > >> around > >> > >> > >> >>> 1.8 Billion records. > >> > >> > >> >>> (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record) > >> > >> > >> >>> > >> > >> > >> >>> Currently I am getting a write throughput of around 100K > >> OPS, > >> > but > >> > >> > >> random > >> > >> > >> >>> reads are very very slow, all gets have more than 100ms or > >> more > >> > >> > >> latency. > >> > >> > >> >>> > >> > >> > >> >>> I have changed the following default configuration, > >> > >> > >> >>> 1. HFile Size: 16GB > >> > >> > >> >>> 2. HDFS Block Size: 512 MB > >> > >> > >> >>> > >> > >> > >> >>> Total Data size is around 1.8 TB (Excluding the replicas). > >> > >> > >> >>> My Table is split into 128 Regions (No pre-splitting used, > >> > >> started > >> > >> > >> with 1 > >> > >> > >> >>> and grew to 128 over the insertion time) > >> > >> > >> >>> > >> > >> > >> >>> Taking some inputs from earlier discussions I have done > the > >> > >> > following > >> > >> > >> >>> changes to disable Nagle (In both Client and Server > >> > >> hbase-site.xml, > >> > >> > >> >>> hdfs-site.xml) > >> > >> > >> >>> > >> > >> > >> >>> <property> > >> > >> > >> >>> <name>hbase.ipc.client.tcpnodelay</name> > >> > >> > >> >>> <value>true</value> > >> > >> > >> >>> </property> > >> > >> > >> >>> > >> > >> > >> >>> <property> > >> > >> > >> >>> <name>ipc.server.tcpnodelay</name> > >> > >> > >> >>> <value>true</value> > >> > >> > >> >>> </property> > >> > >> > >> >>> > >> > >> > >> >>> Ganglia stats shows large CPU IO wait (>30% during reads). > >> > >> > >> >>> > >> > >> > >> >>> I agree that disk configuration is not ideal for Hadoop > >> > cluster, > >> > >> but > >> > >> > >> as > >> > >> > >> >>> told earlier it can't change for now. > >> > >> > >> >>> I feel the latency is way beyond any reported results so > >> far. > >> > >> > >> >>> > >> > >> > >> >>> Any pointers on what can be wrong? > >> > >> > >> >>> > >> > >> > >> >>> Thanks, > >> > >> > >> >>> Ramu > >> > >> > >> >>> > >> > >> > >> >> > >> > >> > >> >> > >> > >> > >> > > >> > >> > >> > >> > >> > > > >> > >> > > > >> > >> > > >> > >> > >> > >> > >> > >> > >> > >> -- > >> > >> Bharath Vissapragada > >> > >> <http://www.cloudera.com> > >> > >> > >> > > > >> > > > >> > > >> > > > > > > Confidentiality Notice: The information contained in this message, > including any attachments hereto, may be confidential and is intended to be > read only by the individual or entity to whom this message is addressed. If > the reader of this message is not the intended recipient or an agent or > designee of the intended recipient, please note that any review, use, > disclosure or distribution of this message or its attachments, in any form, > is strictly prohibited. If you have received this message in error, please > immediately notify the sender and/or [email protected] and > delete or destroy any copy of this message and its attachments. >
