Hi Bharath, I am little confused about the metrics displayed by Cloudera. Even when there are no oeprations, the gc_time metric is showing 2s constant in the graph. Is this the CMS gc_time (in that case no JVm pause) or the GC pause.
GC timings reported earlier is the average taken for gc_time metric for all region servers. Regards, Ramu On Mon, Oct 7, 2013 at 9:10 PM, Ramu M S <[email protected]> wrote: > Jean, > > Yes. It is 2 drives. > > - Ramu > > > On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari < > [email protected]> wrote: > >> Quick questionon the disk side. >> >> When you say: >> 800 GB SATA (7200 RPM) Disk >> Is it 1x800GB? It's raid 1, so might be 2 drives? What's the >> configuration? >> >> JM >> >> >> 2013/10/7 Ramu M S <[email protected]> >> >> > Lars, Bharath, >> > >> > Compression is disabled for the table. This was not intended from the >> > evaluation. >> > I forgot to mention that during table creation. I will enable snappy >> and do >> > major compaction again. >> > >> > Please suggest other options to try out and also suggestions for the >> > previous questions. >> > >> > Thanks, >> > Ramu >> > >> > >> > On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S <[email protected]> wrote: >> > >> > > Bharath, >> > > >> > > I was about to report this. Yes indeed there is too much of GC time. >> > > Just verified the GC time using Cloudera Manager statistics(Every >> minute >> > > update). >> > > >> > > For each Region Server, >> > > - During Read: Graph shows 2s constant. >> > > - During Compaction: Graph starts with 7s and goes as high as 20s >> during >> > > end. >> > > >> > > Few more questions, >> > > 1. For the current evaluation, since the reads are completely random >> and >> > I >> > > don't expect to read same data again can I set the Heap to the >> default 1 >> > GB >> > > ? >> > > >> > > 2. Can I completely turn off BLOCK CACHE for this table? >> > > http://hbase.apache.org/book/regionserver.arch.html recommends >> that >> > > for Randm reads. >> > > >> > > 3. But in the next phase of evaluation, We are interested to use >> HBase as >> > > In-memory KV DB by having the latest data in RAM (To the tune of >> around >> > 128 >> > > GB in each RS, we are setting up 50-100 Node Cluster). I am very >> curious >> > to >> > > hear any suggestions in this regard. >> > > >> > > Regards, >> > > Ramu >> > > >> > > >> > > On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada < >> > > [email protected]> wrote: >> > > >> > >> Hi Ramu, >> > >> >> > >> Thanks for reporting the results back. Just curious if you are >> hitting >> > any >> > >> big GC pauses due to block cache churn on such large heap. Do you see >> > it ? >> > >> >> > >> - Bharath >> > >> >> > >> >> > >> On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S <[email protected]> >> wrote: >> > >> >> > >> > Lars, >> > >> > >> > >> > After changing the BLOCKSIZE to 16KB, the latency has reduced a >> > little. >> > >> Now >> > >> > the average is around 75ms. >> > >> > Overall throughput (I am using 40 Clients to fetch records) is >> around >> > 1K >> > >> > OPS. >> > >> > >> > >> > After compaction hdfsBlocksLocalityIndex is >> 91,88,78,90,99,82,94,97 in >> > >> my 8 >> > >> > RS respectively. >> > >> > >> > >> > Thanks, >> > >> > Ramu >> > >> > >> > >> > >> > >> > On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S <[email protected]> >> > wrote: >> > >> > >> > >> > > Thanks Lars. >> > >> > > >> > >> > > I have changed the BLOCKSIZE to 16KB and triggered a major >> > >> compaction. I >> > >> > > will report my results once it is done. >> > >> > > >> > >> > > - Ramu >> > >> > > >> > >> > > >> > >> > > On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl <[email protected]> >> > >> wrote: >> > >> > > >> > >> > >> First of: 128gb heap per RegionServer. Wow.I'd be interested to >> > hear >> > >> > your >> > >> > >> experience with such a large heap for your RS. It's definitely >> big >> > >> > enough. >> > >> > >> >> > >> > >> >> > >> > >> It's interesting hat 100gb do fit into the aggregate cache (of >> > >> 8x32gb), >> > >> > >> while 1.8tb do not. >> > >> > >> Looks like ~70% of the read request would need to bring in a >> 64kb >> > >> block >> > >> > >> in order to read 724 bytes. >> > >> > >> >> > >> > >> Should that take 100ms? No. Something's still amiss. >> > >> > >> >> > >> > >> Smaller blocks might help (you'd need to bring in 4, 8, or maybe >> > 16k >> > >> to >> > >> > >> read the small row). You would need to issue a major compaction >> for >> > >> > that to >> > >> > >> take effect. >> > >> > >> Maybe try 16k blocks. If that speeds up your random gets we know >> > >> where >> > >> > to >> > >> > >> look next... At the disk IO. >> > >> > >> >> > >> > >> >> > >> > >> -- Lars >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> ________________________________ >> > >> > >> From: Ramu M S <[email protected]> >> > >> > >> To: [email protected]; lars hofhansl <[email protected]> >> > >> > >> Sent: Sunday, October 6, 2013 11:05 PM >> > >> > >> Subject: Re: HBase Random Read latency > 100ms >> > >> > >> >> > >> > >> >> > >> > >> Lars, >> > >> > >> >> > >> > >> In one of your old posts, you had mentioned that lowering the >> > >> BLOCKSIZE >> > >> > is >> > >> > >> good for random reads (of course with increased size for Block >> > >> Indexes). >> > >> > >> >> > >> > >> Post is at >> > >> > http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow >> > >> > >> >> > >> > >> Will that help in my tests? Should I give it a try? If I alter >> my >> > >> table, >> > >> > >> should I trigger a major compaction again for this to take >> effect? >> > >> > >> >> > >> > >> Thanks, >> > >> > >> Ramu >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S <[email protected]> >> > >> wrote: >> > >> > >> >> > >> > >> > Sorry BLOCKSIZE was wrong in my earlier post, it is the >> default >> > 64 >> > >> KB. >> > >> > >> > >> > >> > >> > {NAME => 'usertable', FAMILIES => [{NAME => 'cf', >> > >> DATA_BLOCK_ENCODING >> > >> > => >> > >> > >> > 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', >> > >> VERSIONS => >> > >> > >> '1', >> > >> > >> > COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => >> '2147483647', >> > >> > >> > KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', >> IN_MEMORY => >> > >> > >> 'false', >> > >> > >> > ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]} >> > >> > >> > >> > >> > >> > Thanks, >> > >> > >> > Ramu >> > >> > >> > >> > >> > >> > >> > >> > >> > On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S < >> [email protected]> >> > >> > wrote: >> > >> > >> > >> > >> > >> >> Lars, >> > >> > >> >> >> > >> > >> >> - Yes Short Circuit reading is enabled on both HDFS and >> HBase. >> > >> > >> >> - I had issued Major compaction after table is loaded. >> > >> > >> >> - Region Servers have max heap set as 128 GB. Block Cache >> Size >> > is >> > >> > 0.25 >> > >> > >> of >> > >> > >> >> heap (So 32 GB for each Region Server) Do we need even more? >> > >> > >> >> - Decreasing HFile Size (Default is 1GB )? Should I leave it >> to >> > >> > >> default? >> > >> > >> >> - Keys are Zipfian distributed (By YCSB) >> > >> > >> >> >> > >> > >> >> Bharath, >> > >> > >> >> >> > >> > >> >> Bloom Filters are enabled. Here is my table details, >> > >> > >> >> {NAME => 'usertable', FAMILIES => [{NAME => 'cf', >> > >> DATA_BLOCK_ENCODING >> > >> > >> => >> > >> > >> >> 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', >> > >> VERSIONS >> > >> > => >> > >> > >> '1', >> > >> > >> >> COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => >> '2147483647 >> > ', >> > >> > >> >> KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '16384', >> IN_MEMORY >> > => >> > >> > >> 'false', >> > >> > >> >> ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]} >> > >> > >> >> >> > >> > >> >> When the data size is around 100GB (100 Million records), >> then >> > the >> > >> > >> >> latency is very good. I am getting a throughput of around >> 300K >> > >> OPS. >> > >> > >> >> In both cases (100 GB and 1.8 TB) Ganglia stats show that >> Disk >> > >> reads >> > >> > >> are >> > >> > >> >> around 50-60 MB/s throughout the read cycle. >> > >> > >> >> >> > >> > >> >> Thanks, >> > >> > >> >> Ramu >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl < >> [email protected] >> > > >> > >> > >> wrote: >> > >> > >> >> >> > >> > >> >>> Have you enabled short circuit reading? See here: >> > >> > >> >>> http://hbase.apache.org/book/perf.hdfs.html >> > >> > >> >>> >> > >> > >> >>> How's your data locality (shown on the RegionServer UI >> page). >> > >> > >> >>> >> > >> > >> >>> >> > >> > >> >>> How much memory are you giving your RegionServers? >> > >> > >> >>> If you reads are truly random and the data set does not fit >> > into >> > >> the >> > >> > >> >>> aggregate cache, you'll be dominated by the disk and >> network. >> > >> > >> >>> Each read would need to bring in a 64k (default) HFile >> block. >> > If >> > >> > short >> > >> > >> >>> circuit reading is not enabled you'll get two or three >> context >> > >> > >> switches. >> > >> > >> >>> >> > >> > >> >>> So I would try: >> > >> > >> >>> 1. Enable short circuit reading >> > >> > >> >>> 2. Increase the block cache size per RegionServer >> > >> > >> >>> 3. Decrease the HFile block size >> > >> > >> >>> 4. Make sure your data is local (if it is not, issue a major >> > >> > >> compaction). >> > >> > >> >>> >> > >> > >> >>> >> > >> > >> >>> -- Lars >> > >> > >> >>> >> > >> > >> >>> >> > >> > >> >>> >> > >> > >> >>> ________________________________ >> > >> > >> >>> From: Ramu M S <[email protected]> >> > >> > >> >>> To: [email protected] >> > >> > >> >>> Sent: Sunday, October 6, 2013 10:01 PM >> > >> > >> >>> Subject: HBase Random Read latency > 100ms >> > >> > >> >>> >> > >> > >> >>> >> > >> > >> >>> Hi All, >> > >> > >> >>> >> > >> > >> >>> My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase >> > 0.94.6). >> > >> > >> >>> >> > >> > >> >>> Each Region Server is with the following configuration, >> > >> > >> >>> 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk >> > >> > >> >>> (Unfortunately configured with RAID 1, can't change this as >> the >> > >> > >> Machines >> > >> > >> >>> are leased temporarily for a month). >> > >> > >> >>> >> > >> > >> >>> I am running YCSB benchmark tests on HBase and currently >> > >> inserting >> > >> > >> around >> > >> > >> >>> 1.8 Billion records. >> > >> > >> >>> (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record) >> > >> > >> >>> >> > >> > >> >>> Currently I am getting a write throughput of around 100K >> OPS, >> > but >> > >> > >> random >> > >> > >> >>> reads are very very slow, all gets have more than 100ms or >> more >> > >> > >> latency. >> > >> > >> >>> >> > >> > >> >>> I have changed the following default configuration, >> > >> > >> >>> 1. HFile Size: 16GB >> > >> > >> >>> 2. HDFS Block Size: 512 MB >> > >> > >> >>> >> > >> > >> >>> Total Data size is around 1.8 TB (Excluding the replicas). >> > >> > >> >>> My Table is split into 128 Regions (No pre-splitting used, >> > >> started >> > >> > >> with 1 >> > >> > >> >>> and grew to 128 over the insertion time) >> > >> > >> >>> >> > >> > >> >>> Taking some inputs from earlier discussions I have done the >> > >> > following >> > >> > >> >>> changes to disable Nagle (In both Client and Server >> > >> hbase-site.xml, >> > >> > >> >>> hdfs-site.xml) >> > >> > >> >>> >> > >> > >> >>> <property> >> > >> > >> >>> <name>hbase.ipc.client.tcpnodelay</name> >> > >> > >> >>> <value>true</value> >> > >> > >> >>> </property> >> > >> > >> >>> >> > >> > >> >>> <property> >> > >> > >> >>> <name>ipc.server.tcpnodelay</name> >> > >> > >> >>> <value>true</value> >> > >> > >> >>> </property> >> > >> > >> >>> >> > >> > >> >>> Ganglia stats shows large CPU IO wait (>30% during reads). >> > >> > >> >>> >> > >> > >> >>> I agree that disk configuration is not ideal for Hadoop >> > cluster, >> > >> but >> > >> > >> as >> > >> > >> >>> told earlier it can't change for now. >> > >> > >> >>> I feel the latency is way beyond any reported results so >> far. >> > >> > >> >>> >> > >> > >> >>> Any pointers on what can be wrong? >> > >> > >> >>> >> > >> > >> >>> Thanks, >> > >> > >> >>> Ramu >> > >> > >> >>> >> > >> > >> >> >> > >> > >> >> >> > >> > >> > >> > >> > >> >> > >> > > >> > >> > > >> > >> > >> > >> >> > >> >> > >> >> > >> -- >> > >> Bharath Vissapragada >> > >> <http://www.cloudera.com> >> > >> >> > > >> > > >> > >> > >
