Re: HBase Random Read latency > 100ms

Ramu M S Mon, 07 Oct 2013 05:24:37 -0700

Hi Bharath,

I am little confused about the metrics displayed by Cloudera. Even when
there are no oeprations, the gc_time metric is showing 2s constant in the
graph. Is this the CMS gc_time (in that case no JVm pause) or the GC pause.


GC timings reported earlier is the average taken for gc_time metric for all
region servers.

Regards,
Ramu


On Mon, Oct 7, 2013 at 9:10 PM, Ramu M S <[email protected]> wrote:

> Jean,
>
> Yes. It is 2 drives.
>
> - Ramu
>
>
> On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari <
> [email protected]> wrote:
>
>> Quick questionon the disk side.
>>
>> When you say:
>> 800 GB SATA (7200 RPM) Disk
>> Is it 1x800GB? It's raid 1, so might be 2 drives? What's the
>> configuration?
>>
>> JM
>>
>>
>> 2013/10/7 Ramu M S <[email protected]>
>>
>> > Lars, Bharath,
>> >
>> > Compression is disabled for the table. This was not intended from the
>> > evaluation.
>> > I forgot to mention that during table creation. I will enable snappy
>> and do
>> > major compaction again.
>> >
>> > Please suggest other options to try out and also suggestions for the
>> > previous questions.
>> >
>> > Thanks,
>> > Ramu
>> >
>> >
>> > On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S <[email protected]> wrote:
>> >
>> > > Bharath,
>> > >
>> > > I was about to report this. Yes indeed there is too much of GC time.
>> > > Just verified the GC time using Cloudera Manager statistics(Every
>> minute
>> > > update).
>> > >
>> > > For each Region Server,
>> > >  - During Read: Graph shows 2s constant.
>> > >  - During Compaction: Graph starts with 7s and goes as high as 20s
>> during
>> > > end.
>> > >
>> > > Few more questions,
>> > > 1. For the current evaluation, since the reads are completely random
>> and
>> > I
>> > > don't expect to read same data again can I set the Heap to the
>> default 1
>> > GB
>> > > ?
>> > >
>> > > 2. Can I completely turn off BLOCK CACHE for this table?
>> > >     http://hbase.apache.org/book/regionserver.arch.html recommends
>> that
>> > > for Randm reads.
>> > >
>> > > 3. But in the next phase of evaluation, We are interested to use
>> HBase as
>> > > In-memory KV DB by having the latest data in RAM (To the tune of
>> around
>> > 128
>> > > GB in each RS, we are setting up 50-100 Node Cluster). I am very
>> curious
>> > to
>> > > hear any suggestions in this regard.
>> > >
>> > > Regards,
>> > > Ramu
>> > >
>> > >
>> > > On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada <
>> > > [email protected]> wrote:
>> > >
>> > >> Hi Ramu,
>> > >>
>> > >> Thanks for reporting the results back. Just curious if you are
>> hitting
>> > any
>> > >> big GC pauses due to block cache churn on such large heap. Do you see
>> > it ?
>> > >>
>> > >> - Bharath
>> > >>
>> > >>
>> > >> On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S <[email protected]>
>> wrote:
>> > >>
>> > >> > Lars,
>> > >> >
>> > >> > After changing the BLOCKSIZE to 16KB, the latency has reduced a
>> > little.
>> > >> Now
>> > >> > the average is around 75ms.
>> > >> > Overall throughput (I am using 40 Clients to fetch records) is
>> around
>> > 1K
>> > >> > OPS.
>> > >> >
>> > >> > After compaction hdfsBlocksLocalityIndex is
>> 91,88,78,90,99,82,94,97 in
>> > >> my 8
>> > >> > RS respectively.
>> > >> >
>> > >> > Thanks,
>> > >> > Ramu
>> > >> >
>> > >> >
>> > >> > On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S <[email protected]>
>> > wrote:
>> > >> >
>> > >> > > Thanks Lars.
>> > >> > >
>> > >> > > I have changed the BLOCKSIZE to 16KB and triggered a major
>> > >> compaction. I
>> > >> > > will report my results once it is done.
>> > >> > >
>> > >> > > - Ramu
>> > >> > >
>> > >> > >
>> > >> > > On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl <[email protected]>
>> > >> wrote:
>> > >> > >
>> > >> > >> First of: 128gb heap per RegionServer. Wow.I'd be interested to
>> > hear
>> > >> > your
>> > >> > >> experience with such a large heap for your RS. It's definitely
>> big
>> > >> > enough.
>> > >> > >>
>> > >> > >>
>> > >> > >> It's interesting hat 100gb do fit into the aggregate cache (of
>> > >> 8x32gb),
>> > >> > >> while 1.8tb do not.
>> > >> > >> Looks like ~70% of the read request would need to bring in a
>> 64kb
>> > >> block
>> > >> > >> in order to read 724 bytes.
>> > >> > >>
>> > >> > >> Should that take 100ms? No. Something's still amiss.
>> > >> > >>
>> > >> > >> Smaller blocks might help (you'd need to bring in 4, 8, or maybe
>> > 16k
>> > >> to
>> > >> > >> read the small row). You would need to issue a major compaction
>> for
>> > >> > that to
>> > >> > >> take effect.
>> > >> > >> Maybe try 16k blocks. If that speeds up your random gets we know
>> > >> where
>> > >> > to
>> > >> > >> look next... At the disk IO.
>> > >> > >>
>> > >> > >>
>> > >> > >> -- Lars
>> > >> > >>
>> > >> > >>
>> > >> > >>
>> > >> > >> ________________________________
>> > >> > >>  From: Ramu M S <[email protected]>
>> > >> > >> To: [email protected]; lars hofhansl <[email protected]>
>> > >> > >> Sent: Sunday, October 6, 2013 11:05 PM
>> > >> > >> Subject: Re: HBase Random Read latency > 100ms
>> > >> > >>
>> > >> > >>
>> > >> > >> Lars,
>> > >> > >>
>> > >> > >> In one of your old posts, you had mentioned that lowering the
>> > >> BLOCKSIZE
>> > >> > is
>> > >> > >> good for random reads (of course with increased size for Block
>> > >> Indexes).
>> > >> > >>
>> > >> > >> Post is at
>> > >> > http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow
>> > >> > >>
>> > >> > >> Will that help in my tests? Should I give it a try? If I alter
>> my
>> > >> table,
>> > >> > >> should I trigger a major compaction again for this to take
>> effect?
>> > >> > >>
>> > >> > >> Thanks,
>> > >> > >> Ramu
>> > >> > >>
>> > >> > >>
>> > >> > >>
>> > >> > >> On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S <[email protected]>
>> > >> wrote:
>> > >> > >>
>> > >> > >> > Sorry BLOCKSIZE was wrong in my earlier post, it is the
>> default
>> > 64
>> > >> KB.
>> > >> > >> >
>> > >> > >> > {NAME => 'usertable', FAMILIES => [{NAME => 'cf',
>> > >> DATA_BLOCK_ENCODING
>> > >> > =>
>> > >> > >> > 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0',
>> > >> VERSIONS =>
>> > >> > >> '1',
>> > >> > >> > COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL =>
>> '2147483647',
>> > >> > >> > KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536',
>> IN_MEMORY =>
>> > >> > >> 'false',
>> > >> > >> > ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
>> > >> > >> >
>> > >> > >> > Thanks,
>> > >> > >> > Ramu
>> > >> > >> >
>> > >> > >> >
>> > >> > >> > On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S <
>> [email protected]>
>> > >> > wrote:
>> > >> > >> >
>> > >> > >> >> Lars,
>> > >> > >> >>
>> > >> > >> >> - Yes Short Circuit reading is enabled on both HDFS and
>> HBase.
>> > >> > >> >> - I had issued Major compaction after table is loaded.
>> > >> > >> >> - Region Servers have max heap set as 128 GB. Block Cache
>> Size
>> > is
>> > >> > 0.25
>> > >> > >> of
>> > >> > >> >> heap (So 32 GB for each Region Server) Do we need even more?
>> > >> > >> >> - Decreasing HFile Size (Default is 1GB )? Should I leave it
>> to
>> > >> > >> default?
>> > >> > >> >> - Keys are Zipfian distributed (By YCSB)
>> > >> > >> >>
>> > >> > >> >> Bharath,
>> > >> > >> >>
>> > >> > >> >> Bloom Filters are enabled. Here is my table details,
>> > >> > >> >> {NAME => 'usertable', FAMILIES => [{NAME => 'cf',
>> > >> DATA_BLOCK_ENCODING
>> > >> > >> =>
>> > >> > >> >> 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0',
>> > >> VERSIONS
>> > >> > =>
>> > >> > >> '1',
>> > >> > >> >> COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL =>
>> '2147483647
>> > ',
>> > >> > >> >> KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '16384',
>> IN_MEMORY
>> > =>
>> > >> > >> 'false',
>> > >> > >> >> ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
>> > >> > >> >>
>> > >> > >> >> When the data size is around 100GB (100 Million records),
>> then
>> > the
>> > >> > >> >> latency is very good. I am getting a throughput of around
>> 300K
>> > >> OPS.
>> > >> > >> >> In both cases (100 GB and 1.8 TB) Ganglia stats show that
>> Disk
>> > >> reads
>> > >> > >> are
>> > >> > >> >> around 50-60 MB/s throughout the read cycle.
>> > >> > >> >>
>> > >> > >> >> Thanks,
>> > >> > >> >> Ramu
>> > >> > >> >>
>> > >> > >> >>
>> > >> > >> >> On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl <
>> [email protected]
>> > >
>> > >> > >> wrote:
>> > >> > >> >>
>> > >> > >> >>> Have you enabled short circuit reading? See here:
>> > >> > >> >>> http://hbase.apache.org/book/perf.hdfs.html
>> > >> > >> >>>
>> > >> > >> >>> How's your data locality (shown on the RegionServer UI
>> page).
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>> How much memory are you giving your RegionServers?
>> > >> > >> >>> If you reads are truly random and the data set does not fit
>> > into
>> > >> the
>> > >> > >> >>> aggregate cache, you'll be dominated by the disk and
>> network.
>> > >> > >> >>> Each read would need to bring in a 64k (default) HFile
>> block.
>> > If
>> > >> > short
>> > >> > >> >>> circuit reading is not enabled you'll get two or three
>> context
>> > >> > >> switches.
>> > >> > >> >>>
>> > >> > >> >>> So I would try:
>> > >> > >> >>> 1. Enable short circuit reading
>> > >> > >> >>> 2. Increase the block cache size per RegionServer
>> > >> > >> >>> 3. Decrease the HFile block size
>> > >> > >> >>> 4. Make sure your data is local (if it is not, issue a major
>> > >> > >> compaction).
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>> -- Lars
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>> ________________________________
>> > >> > >> >>>  From: Ramu M S <[email protected]>
>> > >> > >> >>> To: [email protected]
>> > >> > >> >>> Sent: Sunday, October 6, 2013 10:01 PM
>> > >> > >> >>> Subject: HBase Random Read latency > 100ms
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>> Hi All,
>> > >> > >> >>>
>> > >> > >> >>> My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase
>> > 0.94.6).
>> > >> > >> >>>
>> > >> > >> >>> Each Region Server is with the following configuration,
>> > >> > >> >>> 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk
>> > >> > >> >>> (Unfortunately configured with RAID 1, can't change this as
>> the
>> > >> > >> Machines
>> > >> > >> >>> are leased temporarily for a month).
>> > >> > >> >>>
>> > >> > >> >>> I am running YCSB benchmark tests on HBase and currently
>> > >> inserting
>> > >> > >> around
>> > >> > >> >>> 1.8 Billion records.
>> > >> > >> >>> (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record)
>> > >> > >> >>>
>> > >> > >> >>> Currently I am getting a write throughput of around 100K
>> OPS,
>> > but
>> > >> > >> random
>> > >> > >> >>> reads are very very slow, all gets have more than 100ms or
>> more
>> > >> > >> latency.
>> > >> > >> >>>
>> > >> > >> >>> I have changed the following default configuration,
>> > >> > >> >>> 1. HFile Size: 16GB
>> > >> > >> >>> 2. HDFS Block Size: 512 MB
>> > >> > >> >>>
>> > >> > >> >>> Total Data size is around 1.8 TB (Excluding the replicas).
>> > >> > >> >>> My Table is split into 128 Regions (No pre-splitting used,
>> > >> started
>> > >> > >> with 1
>> > >> > >> >>> and grew to 128 over the insertion time)
>> > >> > >> >>>
>> > >> > >> >>> Taking some inputs from earlier discussions I have done the
>> > >> > following
>> > >> > >> >>> changes to disable Nagle (In both Client and Server
>> > >> hbase-site.xml,
>> > >> > >> >>> hdfs-site.xml)
>> > >> > >> >>>
>> > >> > >> >>> <property>
>> > >> > >> >>>   <name>hbase.ipc.client.tcpnodelay</name>
>> > >> > >> >>>   <value>true</value>
>> > >> > >> >>> </property>
>> > >> > >> >>>
>> > >> > >> >>> <property>
>> > >> > >> >>>   <name>ipc.server.tcpnodelay</name>
>> > >> > >> >>>   <value>true</value>
>> > >> > >> >>> </property>
>> > >> > >> >>>
>> > >> > >> >>> Ganglia stats shows large CPU IO wait (>30% during reads).
>> > >> > >> >>>
>> > >> > >> >>> I agree that disk configuration is not ideal for Hadoop
>> > cluster,
>> > >> but
>> > >> > >> as
>> > >> > >> >>> told earlier it can't change for now.
>> > >> > >> >>> I feel the latency is way beyond any reported results so
>> far.
>> > >> > >> >>>
>> > >> > >> >>> Any pointers on what can be wrong?
>> > >> > >> >>>
>> > >> > >> >>> Thanks,
>> > >> > >> >>> Ramu
>> > >> > >> >>>
>> > >> > >> >>
>> > >> > >> >>
>> > >> > >> >
>> > >> > >>
>> > >> > >
>> > >> > >
>> > >> >
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Bharath Vissapragada
>> > >> <http://www.cloudera.com>
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: HBase Random Read latency > 100ms

Reply via email to