Re: HBase Random Read latency > 100ms

Bharath Vissapragada Mon, 07 Oct 2013 01:52:17 -0700

Hi Ramu,

Thanks for reporting the results back. Just curious if you are hitting any
big GC pauses due to block cache churn on such large heap. Do you see it ?


- Bharath


On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S <ramu.ma...@gmail.com> wrote:

> Lars,
>
> After changing the BLOCKSIZE to 16KB, the latency has reduced a little. Now
> the average is around 75ms.
> Overall throughput (I am using 40 Clients to fetch records) is around 1K
> OPS.
>
> After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97 in my 8
> RS respectively.
>
> Thanks,
> Ramu
>
>
> On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S <ramu.ma...@gmail.com> wrote:
>
> > Thanks Lars.
> >
> > I have changed the BLOCKSIZE to 16KB and triggered a major compaction. I
> > will report my results once it is done.
> >
> > - Ramu
> >
> >
> > On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl <la...@apache.org> wrote:
> >
> >> First of: 128gb heap per RegionServer. Wow.I'd be interested to hear
> your
> >> experience with such a large heap for your RS. It's definitely big
> enough.
> >>
> >>
> >> It's interesting hat 100gb do fit into the aggregate cache (of 8x32gb),
> >> while 1.8tb do not.
> >> Looks like ~70% of the read request would need to bring in a 64kb block
> >> in order to read 724 bytes.
> >>
> >> Should that take 100ms? No. Something's still amiss.
> >>
> >> Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k to
> >> read the small row). You would need to issue a major compaction for
> that to
> >> take effect.
> >> Maybe try 16k blocks. If that speeds up your random gets we know where
> to
> >> look next... At the disk IO.
> >>
> >>
> >> -- Lars
> >>
> >>
> >>
> >> ________________________________
> >>  From: Ramu M S <ramu.ma...@gmail.com>
> >> To: user@hbase.apache.org; lars hofhansl <la...@apache.org>
> >> Sent: Sunday, October 6, 2013 11:05 PM
> >> Subject: Re: HBase Random Read latency > 100ms
> >>
> >>
> >> Lars,
> >>
> >> In one of your old posts, you had mentioned that lowering the BLOCKSIZE
> is
> >> good for random reads (of course with increased size for Block Indexes).
> >>
> >> Post is at
> http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow
> >>
> >> Will that help in my tests? Should I give it a try? If I alter my table,
> >> should I trigger a major compaction again for this to take effect?
> >>
> >> Thanks,
> >> Ramu
> >>
> >>
> >>
> >> On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S <ramu.ma...@gmail.com> wrote:
> >>
> >> > Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB.
> >> >
> >> > {NAME => 'usertable', FAMILIES => [{NAME => 'cf', DATA_BLOCK_ENCODING
> =>
> >> > 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', VERSIONS =>
> >> '1',
> >> > COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647',
> >> > KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY =>
> >> 'false',
> >> > ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
> >> >
> >> > Thanks,
> >> > Ramu
> >> >
> >> >
> >> > On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S <ramu.ma...@gmail.com>
> wrote:
> >> >
> >> >> Lars,
> >> >>
> >> >> - Yes Short Circuit reading is enabled on both HDFS and HBase.
> >> >> - I had issued Major compaction after table is loaded.
> >> >> - Region Servers have max heap set as 128 GB. Block Cache Size is
> 0.25
> >> of
> >> >> heap (So 32 GB for each Region Server) Do we need even more?
> >> >> - Decreasing HFile Size (Default is 1GB )? Should I leave it to
> >> default?
> >> >> - Keys are Zipfian distributed (By YCSB)
> >> >>
> >> >> Bharath,
> >> >>
> >> >> Bloom Filters are enabled. Here is my table details,
> >> >> {NAME => 'usertable', FAMILIES => [{NAME => 'cf', DATA_BLOCK_ENCODING
> >> =>
> >> >> 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', VERSIONS
> =>
> >> '1',
> >> >> COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647',
> >> >> KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '16384', IN_MEMORY =>
> >> 'false',
> >> >> ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
> >> >>
> >> >> When the data size is around 100GB (100 Million records), then the
> >> >> latency is very good. I am getting a throughput of around 300K OPS.
> >> >> In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk reads
> >> are
> >> >> around 50-60 MB/s throughout the read cycle.
> >> >>
> >> >> Thanks,
> >> >> Ramu
> >> >>
> >> >>
> >> >> On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl <la...@apache.org>
> >> wrote:
> >> >>
> >> >>> Have you enabled short circuit reading? See here:
> >> >>> http://hbase.apache.org/book/perf.hdfs.html
> >> >>>
> >> >>> How's your data locality (shown on the RegionServer UI page).
> >> >>>
> >> >>>
> >> >>> How much memory are you giving your RegionServers?
> >> >>> If you reads are truly random and the data set does not fit into the
> >> >>> aggregate cache, you'll be dominated by the disk and network.
> >> >>> Each read would need to bring in a 64k (default) HFile block. If
> short
> >> >>> circuit reading is not enabled you'll get two or three context
> >> switches.
> >> >>>
> >> >>> So I would try:
> >> >>> 1. Enable short circuit reading
> >> >>> 2. Increase the block cache size per RegionServer
> >> >>> 3. Decrease the HFile block size
> >> >>> 4. Make sure your data is local (if it is not, issue a major
> >> compaction).
> >> >>>
> >> >>>
> >> >>> -- Lars
> >> >>>
> >> >>>
> >> >>>
> >> >>> ________________________________
> >> >>>  From: Ramu M S <ramu.ma...@gmail.com>
> >> >>> To: user@hbase.apache.org
> >> >>> Sent: Sunday, October 6, 2013 10:01 PM
> >> >>> Subject: HBase Random Read latency > 100ms
> >> >>>
> >> >>>
> >> >>> Hi All,
> >> >>>
> >> >>> My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6).
> >> >>>
> >> >>> Each Region Server is with the following configuration,
> >> >>> 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk
> >> >>> (Unfortunately configured with RAID 1, can't change this as the
> >> Machines
> >> >>> are leased temporarily for a month).
> >> >>>
> >> >>> I am running YCSB benchmark tests on HBase and currently inserting
> >> around
> >> >>> 1.8 Billion records.
> >> >>> (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record)
> >> >>>
> >> >>> Currently I am getting a write throughput of around 100K OPS, but
> >> random
> >> >>> reads are very very slow, all gets have more than 100ms or more
> >> latency.
> >> >>>
> >> >>> I have changed the following default configuration,
> >> >>> 1. HFile Size: 16GB
> >> >>> 2. HDFS Block Size: 512 MB
> >> >>>
> >> >>> Total Data size is around 1.8 TB (Excluding the replicas).
> >> >>> My Table is split into 128 Regions (No pre-splitting used, started
> >> with 1
> >> >>> and grew to 128 over the insertion time)
> >> >>>
> >> >>> Taking some inputs from earlier discussions I have done the
> following
> >> >>> changes to disable Nagle (In both Client and Server hbase-site.xml,
> >> >>> hdfs-site.xml)
> >> >>>
> >> >>> <property>
> >> >>>   <name>hbase.ipc.client.tcpnodelay</name>
> >> >>>   <value>true</value>
> >> >>> </property>
> >> >>>
> >> >>> <property>
> >> >>>   <name>ipc.server.tcpnodelay</name>
> >> >>>   <value>true</value>
> >> >>> </property>
> >> >>>
> >> >>> Ganglia stats shows large CPU IO wait (>30% during reads).
> >> >>>
> >> >>> I agree that disk configuration is not ideal for Hadoop cluster, but
> >> as
> >> >>> told earlier it can't change for now.
> >> >>> I feel the latency is way beyond any reported results so far.
> >> >>>
> >> >>> Any pointers on what can be wrong?
> >> >>>
> >> >>> Thanks,
> >> >>> Ramu
> >> >>>
> >> >>
> >> >>
> >> >
> >>
> >
> >
>



-- 
Bharath Vissapragada
<http://www.cloudera.com>

Re: HBase Random Read latency > 100ms

Reply via email to