Re: HBase Random Read latency 100ms

2013-11-01 Thread Asaf Mesika
How many Parallel GC were you using?

Regarding block cache - just to see I understood this right: if your are
doing a massive read in HBase it's better to turn off block caching through
the Scan attribute?

On Thursday, October 10, 2013, Otis Gospodnetic wrote:

 Hi Ramu,

 I think I saw mentions of this possibly being a GC issue though
 now it seems it may be a disk IO issue?

 3 things:
 1) http://blog.sematext.com/2013/06/24/g1-cms-java-garbage-collector/
 - our G1 experience, with HBase specificallytrivute
 2) If you can share some of your performance graphs (GC, disk IO, JVM
 memory pools, HBase specific ones, etc.) people will likely be able to
 provide better help
 3) You can do 2) with SPM (see sig), and actually you can send email
 to this ML with your graphs directly from SPM. :)

 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm



 On Wed, Oct 9, 2013 at 3:11 AM, Ramu M S ramu.ma...@gmail.com wrote:
  Hi All,
 
  Sorry. There was some mistake in the tests (Clients were not reduced,
  forgot to change the parameter before running tests).
 
  With 8 Clients and,
 
  SCR Enabled : Average Latency is 25 ms, IO Wait % is around 8
  SCR Disabled: Average Latency is 10 ms, IO Wait % is around 2
 
  Still, SCR disabled gives better results, which confuse me. Can anyone
  clarify?
 
  Also, I tried setting the parameter (hbase.regionserver.checksum.verify
 as
  true) Lars suggested with SCR disabled.
  Average Latency is around 9.8 ms, a fraction lesser.
 
  Thanks
  Ramu
 
 
  On Wed, Oct 9, 2013 at 3:32 PM, Ramu M S ramu.ma...@gmail.com wrote:
 
  Hi All,
 
  I just ran only 8 parallel clients,
 
  With SCR Enabled : Average Latency is 80 ms, IO Wait % is around 8
  With SCR Disabled: Average Latency is 40 ms, IO Wait % is around 2
 
  I always thought SCR enabled, allows a client co-located with the
 DataNode
  to read HDFS file blocks directly. This gives a performance boost to
  distributed clients that are aware of locality.
 
  Is my understanding wrong OR it doesn't apply to my scenario?
 
  Meanwhile I will try setting the parameter suggested by Lars and post
 you
  the results.
 
  Thanks,
  Ramu
 
 
  On Wed, Oct 9, 2013 at 2:29 PM, lars hofhansl la...@apache.org wrote:
 
  Good call.
  Could try to enable hbase.regionserver.checksum.verify, which will
 cause
  HBase to do its own checksums rather than relying on HDFS (and which
 saves
  1 IO per block get).
 
  I do think you can expect the index blocks to be cached at all times.
 
  -- Lars
  
  From: Vladimir Rodionov vrodio...@carrieriq.com
  To: user@hbase.apache.org user@hbase.apache.org
  Sent: Tuesday, October 8, 2013 8:44 PM
  Subject: RE: HBase Random Read latency  100ms
 
 
  Upd.
 
  Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO
  (data + .crc) in a worst case. I think if Bloom Filter is enabled than
  it is going to be 6 File IO in a worst case (large data set), therefore
  you will have not 5 IO requests in queue but up to 20-30 IO requests
 in a
  queue
  This definitely explains  100ms avg latency.
 
 
 
  Best regards,
  Vladimir Rodionov
  Principal Platform Engineer
  Carrier IQ, www.carrieriq.com
  e-mail: vrodio...@carrieriq.com
 
  
 
  From: Vladimir Rodionov
  Sent: Tuesday, October 08, 2013 7:24 PM
  To: user@hbase.apache.org
  Subject: RE: HBase Random Read latency  100ms
 
  Ramu,
 
  You have 8 server boxes and 10 client. You have 40 requests in
 parallel -
  5 per RS/DN?
 
  You have 5 requests on random reads in a IO queue of your single RAID1.
  With avg read latency of 10 ms, 5 requests in queue will give us 30ms.
 Add
  some overhead
  of HDFS + HBase and you will probably have your issue explained ?
 
  Your bottleneck is your disk system, I think. When you serve most of



Re: HBase Random Read latency 100ms

2013-10-09 Thread Ramu M S
Hi All,

I just ran only 8 parallel clients,

With SCR Enabled : Average Latency is 80 ms, IO Wait % is around 8
With SCR Disabled: Average Latency is 40 ms, IO Wait % is around 2

I always thought SCR enabled, allows a client co-located with the DataNode
to read HDFS file blocks directly. This gives a performance boost to
distributed clients that are aware of locality.

Is my understanding wrong OR it doesn't apply to my scenario?

Meanwhile I will try setting the parameter suggested by Lars and post you
the results.

Thanks,
Ramu


On Wed, Oct 9, 2013 at 2:29 PM, lars hofhansl la...@apache.org wrote:

 Good call.
 Could try to enable hbase.regionserver.checksum.verify, which will cause
 HBase to do its own checksums rather than relying on HDFS (and which saves
 1 IO per block get).

 I do think you can expect the index blocks to be cached at all times.

 -- Lars
 
 From: Vladimir Rodionov vrodio...@carrieriq.com
 To: user@hbase.apache.org user@hbase.apache.org
 Sent: Tuesday, October 8, 2013 8:44 PM
 Subject: RE: HBase Random Read latency  100ms


 Upd.

 Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO
 (data + .crc) in a worst case. I think if Bloom Filter is enabled than
 it is going to be 6 File IO in a worst case (large data set), therefore
 you will have not 5 IO requests in queue but up to 20-30 IO requests in a
 queue
 This definitely explains  100ms avg latency.



 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 

 From: Vladimir Rodionov
 Sent: Tuesday, October 08, 2013 7:24 PM
 To: user@hbase.apache.org
 Subject: RE: HBase Random Read latency  100ms

 Ramu,

 You have 8 server boxes and 10 client. You have 40 requests in parallel -
 5 per RS/DN?

 You have 5 requests on random reads in a IO queue of your single RAID1.
 With avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add
 some overhead
 of HDFS + HBase and you will probably have your issue explained ?

 Your bottleneck is your disk system, I think. When you serve most of
 requests from disks as in your large data set scenario, make sure you have
 adequate disk sub-system and
 that it is configured properly. Block Cache and OS page can not help you
 in this case as working data set is larger than both caches.

 Good performance numbers in small data set scenario are explained by the
 fact that data fits into OS page cache and Block Cache - you do not read
 data from disk even if
 you disable block cache.


 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Ramu M S [ramu.ma...@gmail.com]
 Sent: Tuesday, October 08, 2013 6:00 PM
 To: user@hbase.apache.org
 Subject: Re: HBase Random Read latency  100ms

 Hi All,

 After few suggestions from the mails earlier I changed the following,

 1. Heap Size to 16 GB
 2. Block Size to 16KB
 3. HFile size to 8 GB (Table now has 256 regions, 32 per server)
 4. Data Locality Index is 100 in all RS

 I have clients running in 10 machines, each with 4 threads. So total 40.
 This is same in all tests.

 Result:
1. Average latency is still 100ms.
2. Heap occupancy is around 2-2.5 GB in all RS

 Few more tests carried out yesterday,

 TEST 1: Small data set (100 Million records, each with 724 bytes).
 ===
 Configurations:
 1. Heap Size to 1 GB
 2. Block Size to 16KB
 3. HFile size to 1 GB (Table now has 128 regions, 16 per server)
 4. Data Locality Index is 100 in all RS

 I disabled Block Cache on the table, to make sure I read everything from
 disk, most of the time.

 Result:
1. Average Latency is 8ms and throughput went up to 6K/Sec per RS.
2. With Block Cache enabled again, I got average latency around 2ms
 and throughput of 10K/Sec per RS.
Heap occupancy around 650 MB
3. Increased the Heap to 16GB, with Block Cache still enabled, I got
 average latency around 1 ms and throughput 20K/Sec per RS
Heap Occupancy around 2-2.5 GB in all RS

 TEST 2: Large Data set (1.8 Billion records, each with 724 bytes)
 ==
 Configurations:
 1. Heap Size to 1 GB
 2. Block Size to 16KB
 3. HFile size to 1 GB (Table now has 2048 regions, 256 per server)
 4. Data Locality Index is 100 in all RS

 Result:
   1. Average Latency is  500ms to start with and gradually decreases, but
 even after around 100 Million reads it is still 100 ms
   2. Block Cache = TRUE/FALSE does not make any difference here. Even Heap
 Size (1GB / 16GB) does not make any difference.
   3. Heap occupancy is around 2-2.5 GB under 16GB Heap and around 650 MB
 under 1GB Heap.

 GC Time in all of the scenarios is around 2ms/Second, as shown in the
 Cloudera Manager.

 Reading most of the items from Disk

Re: HBase Random Read latency 100ms

2013-10-09 Thread Ramu M S
Hi All,

Sorry. There was some mistake in the tests (Clients were not reduced,
forgot to change the parameter before running tests).

With 8 Clients and,

SCR Enabled : Average Latency is 25 ms, IO Wait % is around 8
SCR Disabled: Average Latency is 10 ms, IO Wait % is around 2

Still, SCR disabled gives better results, which confuse me. Can anyone
clarify?

Also, I tried setting the parameter (hbase.regionserver.checksum.verify as
true) Lars suggested with SCR disabled.
Average Latency is around 9.8 ms, a fraction lesser.

Thanks
Ramu


On Wed, Oct 9, 2013 at 3:32 PM, Ramu M S ramu.ma...@gmail.com wrote:

 Hi All,

 I just ran only 8 parallel clients,

 With SCR Enabled : Average Latency is 80 ms, IO Wait % is around 8
 With SCR Disabled: Average Latency is 40 ms, IO Wait % is around 2

 I always thought SCR enabled, allows a client co-located with the DataNode
 to read HDFS file blocks directly. This gives a performance boost to
 distributed clients that are aware of locality.

 Is my understanding wrong OR it doesn't apply to my scenario?

 Meanwhile I will try setting the parameter suggested by Lars and post you
 the results.

 Thanks,
 Ramu


 On Wed, Oct 9, 2013 at 2:29 PM, lars hofhansl la...@apache.org wrote:

 Good call.
 Could try to enable hbase.regionserver.checksum.verify, which will cause
 HBase to do its own checksums rather than relying on HDFS (and which saves
 1 IO per block get).

 I do think you can expect the index blocks to be cached at all times.

 -- Lars
 
 From: Vladimir Rodionov vrodio...@carrieriq.com
 To: user@hbase.apache.org user@hbase.apache.org
 Sent: Tuesday, October 8, 2013 8:44 PM
 Subject: RE: HBase Random Read latency  100ms


 Upd.

 Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO
 (data + .crc) in a worst case. I think if Bloom Filter is enabled than
 it is going to be 6 File IO in a worst case (large data set), therefore
 you will have not 5 IO requests in queue but up to 20-30 IO requests in a
 queue
 This definitely explains  100ms avg latency.



 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 

 From: Vladimir Rodionov
 Sent: Tuesday, October 08, 2013 7:24 PM
 To: user@hbase.apache.org
 Subject: RE: HBase Random Read latency  100ms

 Ramu,

 You have 8 server boxes and 10 client. You have 40 requests in parallel -
 5 per RS/DN?

 You have 5 requests on random reads in a IO queue of your single RAID1.
 With avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add
 some overhead
 of HDFS + HBase and you will probably have your issue explained ?

 Your bottleneck is your disk system, I think. When you serve most of
 requests from disks as in your large data set scenario, make sure you have
 adequate disk sub-system and
 that it is configured properly. Block Cache and OS page can not help you
 in this case as working data set is larger than both caches.

 Good performance numbers in small data set scenario are explained by the
 fact that data fits into OS page cache and Block Cache - you do not read
 data from disk even if
 you disable block cache.


 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Ramu M S [ramu.ma...@gmail.com]
 Sent: Tuesday, October 08, 2013 6:00 PM
 To: user@hbase.apache.org
 Subject: Re: HBase Random Read latency  100ms

 Hi All,

 After few suggestions from the mails earlier I changed the following,

 1. Heap Size to 16 GB
 2. Block Size to 16KB
 3. HFile size to 8 GB (Table now has 256 regions, 32 per server)
 4. Data Locality Index is 100 in all RS

 I have clients running in 10 machines, each with 4 threads. So total 40.
 This is same in all tests.

 Result:
1. Average latency is still 100ms.
2. Heap occupancy is around 2-2.5 GB in all RS

 Few more tests carried out yesterday,

 TEST 1: Small data set (100 Million records, each with 724 bytes).
 ===
 Configurations:
 1. Heap Size to 1 GB
 2. Block Size to 16KB
 3. HFile size to 1 GB (Table now has 128 regions, 16 per server)
 4. Data Locality Index is 100 in all RS

 I disabled Block Cache on the table, to make sure I read everything from
 disk, most of the time.

 Result:
1. Average Latency is 8ms and throughput went up to 6K/Sec per RS.
2. With Block Cache enabled again, I got average latency around 2ms
 and throughput of 10K/Sec per RS.
Heap occupancy around 650 MB
3. Increased the Heap to 16GB, with Block Cache still enabled, I got
 average latency around 1 ms and throughput 20K/Sec per RS
Heap Occupancy around 2-2.5 GB in all RS

 TEST 2: Large Data set (1.8 Billion records, each with 724 bytes)
 ==
 Configurations:
 1. Heap Size to 1

RE: HBase Random Read latency 100ms

2013-10-09 Thread Vladimir Rodionov
I can't say for SCR. There is a possibility that the feature is broken, of 
course.
But the fact that hbase.regionserver.checksum.verify does not affect 
performance means that OS caches
effectively HDFS checksum files.


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Ramu M S [ramu.ma...@gmail.com]
Sent: Wednesday, October 09, 2013 12:11 AM
To: user@hbase.apache.org; lars hofhansl
Subject: Re: HBase Random Read latency  100ms

Hi All,

Sorry. There was some mistake in the tests (Clients were not reduced,
forgot to change the parameter before running tests).

With 8 Clients and,

SCR Enabled : Average Latency is 25 ms, IO Wait % is around 8
SCR Disabled: Average Latency is 10 ms, IO Wait % is around 2

Still, SCR disabled gives better results, which confuse me. Can anyone
clarify?

Also, I tried setting the parameter (hbase.regionserver.checksum.verify as
true) Lars suggested with SCR disabled.
Average Latency is around 9.8 ms, a fraction lesser.

Thanks
Ramu


On Wed, Oct 9, 2013 at 3:32 PM, Ramu M S ramu.ma...@gmail.com wrote:

 Hi All,

 I just ran only 8 parallel clients,

 With SCR Enabled : Average Latency is 80 ms, IO Wait % is around 8
 With SCR Disabled: Average Latency is 40 ms, IO Wait % is around 2

 I always thought SCR enabled, allows a client co-located with the DataNode
 to read HDFS file blocks directly. This gives a performance boost to
 distributed clients that are aware of locality.

 Is my understanding wrong OR it doesn't apply to my scenario?

 Meanwhile I will try setting the parameter suggested by Lars and post you
 the results.

 Thanks,
 Ramu


 On Wed, Oct 9, 2013 at 2:29 PM, lars hofhansl la...@apache.org wrote:

 Good call.
 Could try to enable hbase.regionserver.checksum.verify, which will cause
 HBase to do its own checksums rather than relying on HDFS (and which saves
 1 IO per block get).

 I do think you can expect the index blocks to be cached at all times.

 -- Lars
 
 From: Vladimir Rodionov vrodio...@carrieriq.com
 To: user@hbase.apache.org user@hbase.apache.org
 Sent: Tuesday, October 8, 2013 8:44 PM
 Subject: RE: HBase Random Read latency  100ms


 Upd.

 Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO
 (data + .crc) in a worst case. I think if Bloom Filter is enabled than
 it is going to be 6 File IO in a worst case (large data set), therefore
 you will have not 5 IO requests in queue but up to 20-30 IO requests in a
 queue
 This definitely explains  100ms avg latency.



 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 

 From: Vladimir Rodionov
 Sent: Tuesday, October 08, 2013 7:24 PM
 To: user@hbase.apache.org
 Subject: RE: HBase Random Read latency  100ms

 Ramu,

 You have 8 server boxes and 10 client. You have 40 requests in parallel -
 5 per RS/DN?

 You have 5 requests on random reads in a IO queue of your single RAID1.
 With avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add
 some overhead
 of HDFS + HBase and you will probably have your issue explained ?

 Your bottleneck is your disk system, I think. When you serve most of
 requests from disks as in your large data set scenario, make sure you have
 adequate disk sub-system and
 that it is configured properly. Block Cache and OS page can not help you
 in this case as working data set is larger than both caches.

 Good performance numbers in small data set scenario are explained by the
 fact that data fits into OS page cache and Block Cache - you do not read
 data from disk even if
 you disable block cache.


 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Ramu M S [ramu.ma...@gmail.com]
 Sent: Tuesday, October 08, 2013 6:00 PM
 To: user@hbase.apache.org
 Subject: Re: HBase Random Read latency  100ms

 Hi All,

 After few suggestions from the mails earlier I changed the following,

 1. Heap Size to 16 GB
 2. Block Size to 16KB
 3. HFile size to 8 GB (Table now has 256 regions, 32 per server)
 4. Data Locality Index is 100 in all RS

 I have clients running in 10 machines, each with 4 threads. So total 40.
 This is same in all tests.

 Result:
1. Average latency is still 100ms.
2. Heap occupancy is around 2-2.5 GB in all RS

 Few more tests carried out yesterday,

 TEST 1: Small data set (100 Million records, each with 724 bytes).
 ===
 Configurations:
 1. Heap Size to 1 GB
 2. Block Size to 16KB
 3. HFile size to 1 GB (Table now has 128 regions, 16 per server)
 4. Data Locality Index is 100 in all RS

 I disabled Block Cache on the table, to make sure I read everything from
 disk, most of the time

Re: HBase Random Read latency 100ms

2013-10-09 Thread Jean-Daniel Cryans
On Wed, Oct 9, 2013 at 10:59 AM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:

 I can't say for SCR. There is a possibility that the feature is broken, of
 course.
 But the fact that hbase.regionserver.checksum.verify does not affect
 performance means that OS caches
 effectively HDFS checksum files.


See OS cache + SCR VS HBase CRC over OS cache+SCR in this document I
shared some time ago:
https://docs.google.com/spreadsheet/pub?key=0Ao87IrzZJSaydENaem5USWg4TlRKcHl0dEtTS2NBOUEoutput=html

In an all-in-memory test it shows a pretty big difference.

J-D



 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Ramu M S [ramu.ma...@gmail.com]
 Sent: Wednesday, October 09, 2013 12:11 AM
 To: user@hbase.apache.org; lars hofhansl
 Subject: Re: HBase Random Read latency  100ms

 Hi All,

 Sorry. There was some mistake in the tests (Clients were not reduced,
 forgot to change the parameter before running tests).

 With 8 Clients and,

 SCR Enabled : Average Latency is 25 ms, IO Wait % is around 8
 SCR Disabled: Average Latency is 10 ms, IO Wait % is around 2

 Still, SCR disabled gives better results, which confuse me. Can anyone
 clarify?

 Also, I tried setting the parameter (hbase.regionserver.checksum.verify as
 true) Lars suggested with SCR disabled.
 Average Latency is around 9.8 ms, a fraction lesser.

 Thanks
 Ramu


 On Wed, Oct 9, 2013 at 3:32 PM, Ramu M S ramu.ma...@gmail.com wrote:

  Hi All,
 
  I just ran only 8 parallel clients,
 
  With SCR Enabled : Average Latency is 80 ms, IO Wait % is around 8
  With SCR Disabled: Average Latency is 40 ms, IO Wait % is around 2
 
  I always thought SCR enabled, allows a client co-located with the
 DataNode
  to read HDFS file blocks directly. This gives a performance boost to
  distributed clients that are aware of locality.
 
  Is my understanding wrong OR it doesn't apply to my scenario?
 
  Meanwhile I will try setting the parameter suggested by Lars and post you
  the results.
 
  Thanks,
  Ramu
 
 
  On Wed, Oct 9, 2013 at 2:29 PM, lars hofhansl la...@apache.org wrote:
 
  Good call.
  Could try to enable hbase.regionserver.checksum.verify, which will cause
  HBase to do its own checksums rather than relying on HDFS (and which
 saves
  1 IO per block get).
 
  I do think you can expect the index blocks to be cached at all times.
 
  -- Lars
  
  From: Vladimir Rodionov vrodio...@carrieriq.com
  To: user@hbase.apache.org user@hbase.apache.org
  Sent: Tuesday, October 8, 2013 8:44 PM
  Subject: RE: HBase Random Read latency  100ms
 
 
  Upd.
 
  Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO
  (data + .crc) in a worst case. I think if Bloom Filter is enabled than
  it is going to be 6 File IO in a worst case (large data set), therefore
  you will have not 5 IO requests in queue but up to 20-30 IO requests in
 a
  queue
  This definitely explains  100ms avg latency.
 
 
 
  Best regards,
  Vladimir Rodionov
  Principal Platform Engineer
  Carrier IQ, www.carrieriq.com
  e-mail: vrodio...@carrieriq.com
 
  
 
  From: Vladimir Rodionov
  Sent: Tuesday, October 08, 2013 7:24 PM
  To: user@hbase.apache.org
  Subject: RE: HBase Random Read latency  100ms
 
  Ramu,
 
  You have 8 server boxes and 10 client. You have 40 requests in parallel
 -
  5 per RS/DN?
 
  You have 5 requests on random reads in a IO queue of your single RAID1.
  With avg read latency of 10 ms, 5 requests in queue will give us 30ms.
 Add
  some overhead
  of HDFS + HBase and you will probably have your issue explained ?
 
  Your bottleneck is your disk system, I think. When you serve most of
  requests from disks as in your large data set scenario, make sure you
 have
  adequate disk sub-system and
  that it is configured properly. Block Cache and OS page can not help you
  in this case as working data set is larger than both caches.
 
  Good performance numbers in small data set scenario are explained by the
  fact that data fits into OS page cache and Block Cache - you do not read
  data from disk even if
  you disable block cache.
 
 
  Best regards,
  Vladimir Rodionov
  Principal Platform Engineer
  Carrier IQ, www.carrieriq.com
  e-mail: vrodio...@carrieriq.com
 
  
  From: Ramu M S [ramu.ma...@gmail.com]
  Sent: Tuesday, October 08, 2013 6:00 PM
  To: user@hbase.apache.org
  Subject: Re: HBase Random Read latency  100ms
 
  Hi All,
 
  After few suggestions from the mails earlier I changed the following,
 
  1. Heap Size to 16 GB
  2. Block Size to 16KB
  3. HFile size to 8 GB (Table now has 256 regions, 32 per server)
  4. Data Locality Index is 100 in all RS
 
  I have clients running in 10 machines, each with 4 threads. So total 40.
  This is same in all tests.
 
  Result:
 1. Average latency is still 100ms

Re: HBase Random Read latency 100ms

2013-10-09 Thread Otis Gospodnetic
Hi Ramu,

I think I saw mentions of this possibly being a GC issue though
now it seems it may be a disk IO issue?

3 things:
1) http://blog.sematext.com/2013/06/24/g1-cms-java-garbage-collector/
- our G1 experience, with HBase specifically
2) If you can share some of your performance graphs (GC, disk IO, JVM
memory pools, HBase specific ones, etc.) people will likely be able to
provide better help
3) You can do 2) with SPM (see sig), and actually you can send email
to this ML with your graphs directly from SPM. :)

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Wed, Oct 9, 2013 at 3:11 AM, Ramu M S ramu.ma...@gmail.com wrote:
 Hi All,

 Sorry. There was some mistake in the tests (Clients were not reduced,
 forgot to change the parameter before running tests).

 With 8 Clients and,

 SCR Enabled : Average Latency is 25 ms, IO Wait % is around 8
 SCR Disabled: Average Latency is 10 ms, IO Wait % is around 2

 Still, SCR disabled gives better results, which confuse me. Can anyone
 clarify?

 Also, I tried setting the parameter (hbase.regionserver.checksum.verify as
 true) Lars suggested with SCR disabled.
 Average Latency is around 9.8 ms, a fraction lesser.

 Thanks
 Ramu


 On Wed, Oct 9, 2013 at 3:32 PM, Ramu M S ramu.ma...@gmail.com wrote:

 Hi All,

 I just ran only 8 parallel clients,

 With SCR Enabled : Average Latency is 80 ms, IO Wait % is around 8
 With SCR Disabled: Average Latency is 40 ms, IO Wait % is around 2

 I always thought SCR enabled, allows a client co-located with the DataNode
 to read HDFS file blocks directly. This gives a performance boost to
 distributed clients that are aware of locality.

 Is my understanding wrong OR it doesn't apply to my scenario?

 Meanwhile I will try setting the parameter suggested by Lars and post you
 the results.

 Thanks,
 Ramu


 On Wed, Oct 9, 2013 at 2:29 PM, lars hofhansl la...@apache.org wrote:

 Good call.
 Could try to enable hbase.regionserver.checksum.verify, which will cause
 HBase to do its own checksums rather than relying on HDFS (and which saves
 1 IO per block get).

 I do think you can expect the index blocks to be cached at all times.

 -- Lars
 
 From: Vladimir Rodionov vrodio...@carrieriq.com
 To: user@hbase.apache.org user@hbase.apache.org
 Sent: Tuesday, October 8, 2013 8:44 PM
 Subject: RE: HBase Random Read latency  100ms


 Upd.

 Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO
 (data + .crc) in a worst case. I think if Bloom Filter is enabled than
 it is going to be 6 File IO in a worst case (large data set), therefore
 you will have not 5 IO requests in queue but up to 20-30 IO requests in a
 queue
 This definitely explains  100ms avg latency.



 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 

 From: Vladimir Rodionov
 Sent: Tuesday, October 08, 2013 7:24 PM
 To: user@hbase.apache.org
 Subject: RE: HBase Random Read latency  100ms

 Ramu,

 You have 8 server boxes and 10 client. You have 40 requests in parallel -
 5 per RS/DN?

 You have 5 requests on random reads in a IO queue of your single RAID1.
 With avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add
 some overhead
 of HDFS + HBase and you will probably have your issue explained ?

 Your bottleneck is your disk system, I think. When you serve most of
 requests from disks as in your large data set scenario, make sure you have
 adequate disk sub-system and
 that it is configured properly. Block Cache and OS page can not help you
 in this case as working data set is larger than both caches.

 Good performance numbers in small data set scenario are explained by the
 fact that data fits into OS page cache and Block Cache - you do not read
 data from disk even if
 you disable block cache.


 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Ramu M S [ramu.ma...@gmail.com]
 Sent: Tuesday, October 08, 2013 6:00 PM
 To: user@hbase.apache.org
 Subject: Re: HBase Random Read latency  100ms

 Hi All,

 After few suggestions from the mails earlier I changed the following,

 1. Heap Size to 16 GB
 2. Block Size to 16KB
 3. HFile size to 8 GB (Table now has 256 regions, 32 per server)
 4. Data Locality Index is 100 in all RS

 I have clients running in 10 machines, each with 4 threads. So total 40.
 This is same in all tests.

 Result:
1. Average latency is still 100ms.
2. Heap occupancy is around 2-2.5 GB in all RS

 Few more tests carried out yesterday,

 TEST 1: Small data set (100 Million records, each with 724 bytes).
 ===
 Configurations:
 1. Heap Size to 1 GB
 2. Block Size to 16KB
 3. HFile size to 1 GB (Table now has 128

Re: HBase Random Read latency 100ms

2013-10-08 Thread lars hofhansl
He still should not see 100ms latency. 20ms, sure. 100ms seems large; there are 
still 8 machines serving the requests.

I agree this spec is far from optimal, but there is still something odd here.


Ramu, this does not look like a GC issue. You'd see much larger (worst case) 
latencies if that were the case (dozens of seconds).
Are you using 40 client from 40 different machines? Or from 40 different 
processes on the same machine? Or 40 threads in the same process?

Thanks.

-- Lars




 From: Vladimir Rodionov vrodio...@carrieriq.com
To: user@hbase.apache.org user@hbase.apache.org 
Sent: Monday, October 7, 2013 11:02 AM
Subject: RE: HBase Random Read latency  100ms
 

Ramu, your HBase configuration (128GB of heap) is far from optimal.
Nobody runs HBase with that amount of heap to my best knowledge.
32GB of RAM is the usual upper limit. We run 8-12GB in production.

What else, your IO capacity is VERY low. 2 SATA drives in RAID 1 for mostly 
random reads load?
You should have 8, better 12-16 drives per server. Forget about RAID. You have 
HDFS.

Block cache in your case does not help much , as since your read amplification 
is at least x20 (16KB block and 724 B read) - its just waste
RAM (heap). In your case you do not need LARGE heap and LARGE block cache.

I advise you reconsidering your hardware spec, applying all optimizations 
mentioned already in this thread and lowering your expectations.

With a right hardware you will be able to get 500-1000 truly random reads per 
server.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com



From: Ramu M S [ramu.ma...@gmail.com]
Sent: Monday, October 07, 2013 5:23 AM
To: user@hbase.apache.org
Subject: Re: HBase Random Read latency  100ms

Hi Bharath,

I am little confused about the metrics displayed by Cloudera. Even when
there are no oeprations, the gc_time metric is showing 2s constant in the
graph. Is this the CMS gc_time (in that case no JVm pause) or the GC pause.

GC timings reported earlier is the average taken for gc_time metric for all
region servers.

Regards,
Ramu


On Mon, Oct 7, 2013 at 9:10 PM, Ramu M S ramu.ma...@gmail.com wrote:

 Jean,

 Yes. It is 2 drives.

 - Ramu


 On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

 Quick questionon the disk side.

 When you say:
 800 GB SATA (7200 RPM) Disk
 Is it 1x800GB? It's raid 1, so might be 2 drives? What's the
 configuration?

 JM


 2013/10/7 Ramu M S ramu.ma...@gmail.com

  Lars, Bharath,
 
  Compression is disabled for the table. This was not intended from the
  evaluation.
  I forgot to mention that during table creation. I will enable snappy
 and do
  major compaction again.
 
  Please suggest other options to try out and also suggestions for the
  previous questions.
 
  Thanks,
  Ramu
 
 
  On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com wrote:
 
   Bharath,
  
   I was about to report this. Yes indeed there is too much of GC time.
   Just verified the GC time using Cloudera Manager statistics(Every
 minute
   update).
  
   For each Region Server,
    - During Read: Graph shows 2s constant.
    - During Compaction: Graph starts with 7s and goes as high as 20s
 during
   end.
  
   Few more questions,
   1. For the current evaluation, since the reads are completely random
 and
  I
   don't expect to read same data again can I set the Heap to the
 default 1
  GB
   ?
  
   2. Can I completely turn off BLOCK CACHE for this table?
      http://hbase.apache.org/book/regionserver.arch.html recommends
 that
   for Randm reads.
  
   3. But in the next phase of evaluation, We are interested to use
 HBase as
   In-memory KV DB by having the latest data in RAM (To the tune of
 around
  128
   GB in each RS, we are setting up 50-100 Node Cluster). I am very
 curious
  to
   hear any suggestions in this regard.
  
   Regards,
   Ramu
  
  
   On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada 
   bhara...@cloudera.com wrote:
  
   Hi Ramu,
  
   Thanks for reporting the results back. Just curious if you are
 hitting
  any
   big GC pauses due to block cache churn on such large heap. Do you see
  it ?
  
   - Bharath
  
  
   On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com
 wrote:
  
Lars,
   
After changing the BLOCKSIZE to 16KB, the latency has reduced a
  little.
   Now
the average is around 75ms.
Overall throughput (I am using 40 Clients to fetch records) is
 around
  1K
OPS.
   
After compaction hdfsBlocksLocalityIndex is
 91,88,78,90,99,82,94,97 in
   my 8
RS respectively.
   
Thanks,
Ramu
   
   
On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com
  wrote:
   
 Thanks Lars.

 I have changed the BLOCKSIZE to 16KB and triggered a major
   compaction. I
 will report my results once it is done.

 - Ramu

Re: HBase Random Read latency 100ms

2013-10-08 Thread Varun Sharma
How many reads per second per region server are you throwing at the system
- also is 100ms the average latency ?


On Mon, Oct 7, 2013 at 2:04 PM, lars hofhansl la...@apache.org wrote:

 He still should not see 100ms latency. 20ms, sure. 100ms seems large;
 there are still 8 machines serving the requests.

 I agree this spec is far from optimal, but there is still something odd
 here.


 Ramu, this does not look like a GC issue. You'd see much larger (worst
 case) latencies if that were the case (dozens of seconds).
 Are you using 40 client from 40 different machines? Or from 40 different
 processes on the same machine? Or 40 threads in the same process?

 Thanks.

 -- Lars



 
  From: Vladimir Rodionov vrodio...@carrieriq.com
 To: user@hbase.apache.org user@hbase.apache.org
 Sent: Monday, October 7, 2013 11:02 AM
 Subject: RE: HBase Random Read latency  100ms


 Ramu, your HBase configuration (128GB of heap) is far from optimal.
 Nobody runs HBase with that amount of heap to my best knowledge.
 32GB of RAM is the usual upper limit. We run 8-12GB in production.

 What else, your IO capacity is VERY low. 2 SATA drives in RAID 1 for
 mostly random reads load?
 You should have 8, better 12-16 drives per server. Forget about RAID. You
 have HDFS.

 Block cache in your case does not help much , as since your read
 amplification is at least x20 (16KB block and 724 B read) - its just waste
 RAM (heap). In your case you do not need LARGE heap and LARGE block cache.

 I advise you reconsidering your hardware spec, applying all optimizations
 mentioned already in this thread and lowering your expectations.

 With a right hardware you will be able to get 500-1000 truly random reads
 per server.

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 

 From: Ramu M S [ramu.ma...@gmail.com]
 Sent: Monday, October 07, 2013 5:23 AM
 To: user@hbase.apache.org
 Subject: Re: HBase Random Read latency  100ms

 Hi Bharath,

 I am little confused about the metrics displayed by Cloudera. Even when
 there are no oeprations, the gc_time metric is showing 2s constant in the
 graph. Is this the CMS gc_time (in that case no JVm pause) or the GC pause.

 GC timings reported earlier is the average taken for gc_time metric for all
 region servers.

 Regards,
 Ramu


 On Mon, Oct 7, 2013 at 9:10 PM, Ramu M S ramu.ma...@gmail.com wrote:

  Jean,
 
  Yes. It is 2 drives.
 
  - Ramu
 
 
  On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org wrote:
 
  Quick questionon the disk side.
 
  When you say:
  800 GB SATA (7200 RPM) Disk
  Is it 1x800GB? It's raid 1, so might be 2 drives? What's the
  configuration?
 
  JM
 
 
  2013/10/7 Ramu M S ramu.ma...@gmail.com
 
   Lars, Bharath,
  
   Compression is disabled for the table. This was not intended from the
   evaluation.
   I forgot to mention that during table creation. I will enable snappy
  and do
   major compaction again.
  
   Please suggest other options to try out and also suggestions for the
   previous questions.
  
   Thanks,
   Ramu
  
  
   On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com
 wrote:
  
Bharath,
   
I was about to report this. Yes indeed there is too much of GC time.
Just verified the GC time using Cloudera Manager statistics(Every
  minute
update).
   
For each Region Server,
 - During Read: Graph shows 2s constant.
 - During Compaction: Graph starts with 7s and goes as high as 20s
  during
end.
   
Few more questions,
1. For the current evaluation, since the reads are completely random
  and
   I
don't expect to read same data again can I set the Heap to the
  default 1
   GB
?
   
2. Can I completely turn off BLOCK CACHE for this table?
   http://hbase.apache.org/book/regionserver.arch.html recommends
  that
for Randm reads.
   
3. But in the next phase of evaluation, We are interested to use
  HBase as
In-memory KV DB by having the latest data in RAM (To the tune of
  around
   128
GB in each RS, we are setting up 50-100 Node Cluster). I am very
  curious
   to
hear any suggestions in this regard.
   
Regards,
Ramu
   
   
On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada 
bhara...@cloudera.com wrote:
   
Hi Ramu,
   
Thanks for reporting the results back. Just curious if you are
  hitting
   any
big GC pauses due to block cache churn on such large heap. Do you
 see
   it ?
   
- Bharath
   
   
On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com
  wrote:
   
 Lars,

 After changing the BLOCKSIZE to 16KB, the latency has reduced a
   little.
Now
 the average is around 75ms.
 Overall throughput (I am using 40 Clients to fetch records) is
  around
   1K
 OPS.

 After compaction hdfsBlocksLocalityIndex

RE: HBase Random Read latency 100ms

2013-10-08 Thread Vladimir Rodionov
What are your current heap and block cache sizes?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Ramu M S [ramu.ma...@gmail.com]
Sent: Monday, October 07, 2013 10:55 PM
To: user@hbase.apache.org
Subject: Re: HBase Random Read latency  100ms

Hi All,

Average Latency is still around 80ms.
I have done the following,

1. Enabled Snappy Compression
2. Reduce the HFile size to 8 GB

Should I attribute these results to bad Disk Configuration OR anything else
to investigate?

- Ramu


On Tue, Oct 8, 2013 at 10:56 AM, Ramu M S ramu.ma...@gmail.com wrote:

 Vladimir,

 Thanks for the Insights into Future Caching features. Looks very
 interesting.

 - Ramu


 On Tue, Oct 8, 2013 at 10:45 AM, Vladimir Rodionov 
 vrodio...@carrieriq.com wrote:

 Ramu,

 If your working set of data fits into 192GB you may get additional boost
 by utilizing OS page cache, or wait until
 0.98 release which introduces new bucket cache implementation (port of
 Facebook L2 cache). You can try vanilla bucket cache in 0.96 (not released
 yet
 but is due soon). Both caches stores data off-heap, but Facebook version
 can store encoded and compressed data and vanilla bucket cache does not.
 There are some options how to utilize efficiently available RAM (at least
 in upcoming HBase releases)
 . If your data set does not fit RAM then your only hope is your 24 SAS
 drives. Depending on your RAID settings, disk IO perf, HDFS configuration
 (I think the latest Hadoop is preferable here).

 OS page cache is most vulnerable and volatile, it can not be controlled
 and can be easily polluted by either some other processes or by HBase
 itself (long scan).
 With Block cache you have more control but the first truly usable
 *official* implementation is going to be a part of 0.98 release.

 As far as I understand, your use case would definitely covered by
 something similar to BigTable ScanCache (RowCache) , but there is no such
 cache in HBase yet.
 One major advantage of RowCache vs BlockCache (apart from being much more
 efficient in RAM usage) is resilience to Region compactions. Each minor
 Region compaction invalidates partially
 Region's data in BlockCache and major compaction invalidates this
 Region's data completely. This is not the case with RowCache (would it be
 implemented).

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Ramu M S [ramu.ma...@gmail.com]
 Sent: Monday, October 07, 2013 5:25 PM
 To: user@hbase.apache.org
 Subject: Re: HBase Random Read latency  100ms

 Vladimir,

 Yes. I am fully aware of the HDD limitation and wrong configurations wrt
 RAID.
 Unfortunately, the hardware is leased from others for this work and I
 wasn't consulted to decide the h/w specification for the tests that I am
 doing now. Even the RAID cannot be turned off or set to RAID-0

 Production system is according to the Hadoop needs (100 Nodes with 16 Core
 CPU, 192 GB RAM, 24 X 600GB SAS Drives, RAID cannot be completely turned
 off, so we are creating 1 Virtual Disk containing only 1 Physical Disk and
 the VD RAID level set to* *RAID-0). These systems are still not
 available. If
 you have any suggestion on the production setup, I will be glad to hear.

 Also, as pointed out earlier, we are planning to use HBase also as an in
 memory KV store to access the latest data.
 That's why RAM was considered huge in this configuration. But looks like
 we
 would run into more problems than any gains from this.

 Keeping that aside, I was trying to get the maximum out of the current
 cluster or as you said Is 500-1000 OPS the max I could get out of this
 setup?

 Regards,
 Ramu



 Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended to be
 read only by the individual or entity to whom this message is addressed. If
 the reader of this message is not the intended recipient or an agent or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any form,
 is strictly prohibited.  If you have received this message in error, please
 immediately notify the sender and/or notificati...@carrieriq.com and
 delete or destroy any copy of this message and its attachments.




Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message

RE: HBase Random Read latency 100ms

2013-10-08 Thread Vladimir Rodionov
This can be GC related.

128GB heap size,
51.2GB - BlockCache size (on heap)

Zipfian distribution of small objects (712B)

Results: extreme cache pollution and eviction rate. High eviction - High GC
As far as I remember, LruBlockCache does not have real-time eviction and odes 
it in batches,
these batches will add another latency spikes.

Decrease heap, reduce block cache size (to minimum) and repeat test.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: lars hofhansl [la...@apache.org]
Sent: Monday, October 07, 2013 2:04 PM
To: user@hbase.apache.org
Subject: Re: HBase Random Read latency  100ms

He still should not see 100ms latency. 20ms, sure. 100ms seems large; there are 
still 8 machines serving the requests.

I agree this spec is far from optimal, but there is still something odd here.


Ramu, this does not look like a GC issue. You'd see much larger (worst case) 
latencies if that were the case (dozens of seconds).
Are you using 40 client from 40 different machines? Or from 40 different 
processes on the same machine? Or 40 threads in the same process?

Thanks.

-- Lars




 From: Vladimir Rodionov vrodio...@carrieriq.com
To: user@hbase.apache.org user@hbase.apache.org
Sent: Monday, October 7, 2013 11:02 AM
Subject: RE: HBase Random Read latency  100ms


Ramu, your HBase configuration (128GB of heap) is far from optimal.
Nobody runs HBase with that amount of heap to my best knowledge.
32GB of RAM is the usual upper limit. We run 8-12GB in production.

What else, your IO capacity is VERY low. 2 SATA drives in RAID 1 for mostly 
random reads load?
You should have 8, better 12-16 drives per server. Forget about RAID. You have 
HDFS.

Block cache in your case does not help much , as since your read amplification 
is at least x20 (16KB block and 724 B read) - its just waste
RAM (heap). In your case you do not need LARGE heap and LARGE block cache.

I advise you reconsidering your hardware spec, applying all optimizations 
mentioned already in this thread and lowering your expectations.

With a right hardware you will be able to get 500-1000 truly random reads per 
server.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com



From: Ramu M S [ramu.ma...@gmail.com]
Sent: Monday, October 07, 2013 5:23 AM
To: user@hbase.apache.org
Subject: Re: HBase Random Read latency  100ms

Hi Bharath,

I am little confused about the metrics displayed by Cloudera. Even when
there are no oeprations, the gc_time metric is showing 2s constant in the
graph. Is this the CMS gc_time (in that case no JVm pause) or the GC pause.

GC timings reported earlier is the average taken for gc_time metric for all
region servers.

Regards,
Ramu


On Mon, Oct 7, 2013 at 9:10 PM, Ramu M S ramu.ma...@gmail.com wrote:

 Jean,

 Yes. It is 2 drives.

 - Ramu


 On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

 Quick questionon the disk side.

 When you say:
 800 GB SATA (7200 RPM) Disk
 Is it 1x800GB? It's raid 1, so might be 2 drives? What's the
 configuration?

 JM


 2013/10/7 Ramu M S ramu.ma...@gmail.com

  Lars, Bharath,
 
  Compression is disabled for the table. This was not intended from the
  evaluation.
  I forgot to mention that during table creation. I will enable snappy
 and do
  major compaction again.
 
  Please suggest other options to try out and also suggestions for the
  previous questions.
 
  Thanks,
  Ramu
 
 
  On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com wrote:
 
   Bharath,
  
   I was about to report this. Yes indeed there is too much of GC time.
   Just verified the GC time using Cloudera Manager statistics(Every
 minute
   update).
  
   For each Region Server,
- During Read: Graph shows 2s constant.
- During Compaction: Graph starts with 7s and goes as high as 20s
 during
   end.
  
   Few more questions,
   1. For the current evaluation, since the reads are completely random
 and
  I
   don't expect to read same data again can I set the Heap to the
 default 1
  GB
   ?
  
   2. Can I completely turn off BLOCK CACHE for this table?
  http://hbase.apache.org/book/regionserver.arch.html recommends
 that
   for Randm reads.
  
   3. But in the next phase of evaluation, We are interested to use
 HBase as
   In-memory KV DB by having the latest data in RAM (To the tune of
 around
  128
   GB in each RS, we are setting up 50-100 Node Cluster). I am very
 curious
  to
   hear any suggestions in this regard.
  
   Regards,
   Ramu
  
  
   On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada 
   bhara...@cloudera.com wrote:
  
   Hi Ramu,
  
   Thanks for reporting the results back. Just curious if you are
 hitting
  any
   big GC pauses due to block cache churn on such large heap. Do you see

RE: HBase Random Read latency 100ms

2013-10-08 Thread Vladimir Rodionov
Ramu,

You have 8 server boxes and 10 client. You have 40 requests in parallel - 5 per 
RS/DN?

You have 5 requests on random reads in a IO queue of your single RAID1. With 
avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add some 
overhead
of HDFS + HBase and you will probably have your issue explained ?

Your bottleneck is your disk system, I think. When you serve most of requests 
from disks as in your large data set scenario, make sure you have adequate disk 
sub-system and
that it is configured properly. Block Cache and OS page can not help you in 
this case as working data set is larger than both caches.

Good performance numbers in small data set scenario are explained by the fact 
that data fits into OS page cache and Block Cache - you do not read data from 
disk even if
you disable block cache.


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Ramu M S [ramu.ma...@gmail.com]
Sent: Tuesday, October 08, 2013 6:00 PM
To: user@hbase.apache.org
Subject: Re: HBase Random Read latency  100ms

Hi All,

After few suggestions from the mails earlier I changed the following,

1. Heap Size to 16 GB
2. Block Size to 16KB
3. HFile size to 8 GB (Table now has 256 regions, 32 per server)
4. Data Locality Index is 100 in all RS

I have clients running in 10 machines, each with 4 threads. So total 40.
This is same in all tests.

Result:
   1. Average latency is still 100ms.
   2. Heap occupancy is around 2-2.5 GB in all RS

Few more tests carried out yesterday,

TEST 1: Small data set (100 Million records, each with 724 bytes).
===
Configurations:
1. Heap Size to 1 GB
2. Block Size to 16KB
3. HFile size to 1 GB (Table now has 128 regions, 16 per server)
4. Data Locality Index is 100 in all RS

I disabled Block Cache on the table, to make sure I read everything from
disk, most of the time.

Result:
   1. Average Latency is 8ms and throughput went up to 6K/Sec per RS.
   2. With Block Cache enabled again, I got average latency around 2ms
and throughput of 10K/Sec per RS.
   Heap occupancy around 650 MB
   3. Increased the Heap to 16GB, with Block Cache still enabled, I got
average latency around 1 ms and throughput 20K/Sec per RS
   Heap Occupancy around 2-2.5 GB in all RS

TEST 2: Large Data set (1.8 Billion records, each with 724 bytes)
==
Configurations:
1. Heap Size to 1 GB
2. Block Size to 16KB
3. HFile size to 1 GB (Table now has 2048 regions, 256 per server)
4. Data Locality Index is 100 in all RS

Result:
  1. Average Latency is  500ms to start with and gradually decreases, but
even after around 100 Million reads it is still 100 ms
  2. Block Cache = TRUE/FALSE does not make any difference here. Even Heap
Size (1GB / 16GB) does not make any difference.
  3. Heap occupancy is around 2-2.5 GB under 16GB Heap and around 650 MB
under 1GB Heap.

GC Time in all of the scenarios is around 2ms/Second, as shown in the
Cloudera Manager.

Reading most of the items from Disk in less data scenario gives better
results and very low latencies.

Number of regions per RS and HFile size does make a huge difference in my
Cluster.
Keeping 100 Regions per RS as max(Most of the discussions suggest this),
should I restrict the HFile size to 1GB? and thus reducing the storage
capacity (From 700 GB to 100GB per RS)?

Please advice.

Thanks,
Ramu


On Wed, Oct 9, 2013 at 4:58 AM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:

 What are your current heap and block cache sizes?

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Ramu M S [ramu.ma...@gmail.com]
 Sent: Monday, October 07, 2013 10:55 PM
 To: user@hbase.apache.org
 Subject: Re: HBase Random Read latency  100ms

 Hi All,

 Average Latency is still around 80ms.
 I have done the following,

 1. Enabled Snappy Compression
 2. Reduce the HFile size to 8 GB

 Should I attribute these results to bad Disk Configuration OR anything else
 to investigate?

 - Ramu


 On Tue, Oct 8, 2013 at 10:56 AM, Ramu M S ramu.ma...@gmail.com wrote:

  Vladimir,
 
  Thanks for the Insights into Future Caching features. Looks very
  interesting.
 
  - Ramu
 
 
  On Tue, Oct 8, 2013 at 10:45 AM, Vladimir Rodionov 
  vrodio...@carrieriq.com wrote:
 
  Ramu,
 
  If your working set of data fits into 192GB you may get additional boost
  by utilizing OS page cache, or wait until
  0.98 release which introduces new bucket cache implementation (port of
  Facebook L2 cache). You can try vanilla bucket cache in 0.96 (not
 released
  yet
  but is due soon). Both caches stores data off-heap, but Facebook version
  can store encoded and compressed data and vanilla bucket cache does not.
  There are some options how to utilize efficiently

RE: HBase Random Read latency 100ms

2013-10-08 Thread Vladimir Rodionov
I suggest you two additional tests on large dataset:

Run one client thread per server (8 max) with:
1. SCR enabled
2. SCR disabled


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Vladimir Rodionov
Sent: Tuesday, October 08, 2013 7:24 PM
To: user@hbase.apache.org
Subject: RE: HBase Random Read latency  100ms

Ramu,

You have 8 server boxes and 10 client. You have 40 requests in parallel - 5 per 
RS/DN?

You have 5 requests on random reads in a IO queue of your single RAID1. With 
avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add some 
overhead
of HDFS + HBase and you will probably have your issue explained ?

Your bottleneck is your disk system, I think. When you serve most of requests 
from disks as in your large data set scenario, make sure you have adequate disk 
sub-system and
that it is configured properly. Block Cache and OS page can not help you in 
this case as working data set is larger than both caches.

Good performance numbers in small data set scenario are explained by the fact 
that data fits into OS page cache and Block Cache - you do not read data from 
disk even if
you disable block cache.


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Ramu M S [ramu.ma...@gmail.com]
Sent: Tuesday, October 08, 2013 6:00 PM
To: user@hbase.apache.org
Subject: Re: HBase Random Read latency  100ms

Hi All,

After few suggestions from the mails earlier I changed the following,

1. Heap Size to 16 GB
2. Block Size to 16KB
3. HFile size to 8 GB (Table now has 256 regions, 32 per server)
4. Data Locality Index is 100 in all RS

I have clients running in 10 machines, each with 4 threads. So total 40.
This is same in all tests.

Result:
   1. Average latency is still 100ms.
   2. Heap occupancy is around 2-2.5 GB in all RS

Few more tests carried out yesterday,

TEST 1: Small data set (100 Million records, each with 724 bytes).
===
Configurations:
1. Heap Size to 1 GB
2. Block Size to 16KB
3. HFile size to 1 GB (Table now has 128 regions, 16 per server)
4. Data Locality Index is 100 in all RS

I disabled Block Cache on the table, to make sure I read everything from
disk, most of the time.

Result:
   1. Average Latency is 8ms and throughput went up to 6K/Sec per RS.
   2. With Block Cache enabled again, I got average latency around 2ms
and throughput of 10K/Sec per RS.
   Heap occupancy around 650 MB
   3. Increased the Heap to 16GB, with Block Cache still enabled, I got
average latency around 1 ms and throughput 20K/Sec per RS
   Heap Occupancy around 2-2.5 GB in all RS

TEST 2: Large Data set (1.8 Billion records, each with 724 bytes)
==
Configurations:
1. Heap Size to 1 GB
2. Block Size to 16KB
3. HFile size to 1 GB (Table now has 2048 regions, 256 per server)
4. Data Locality Index is 100 in all RS

Result:
  1. Average Latency is  500ms to start with and gradually decreases, but
even after around 100 Million reads it is still 100 ms
  2. Block Cache = TRUE/FALSE does not make any difference here. Even Heap
Size (1GB / 16GB) does not make any difference.
  3. Heap occupancy is around 2-2.5 GB under 16GB Heap and around 650 MB
under 1GB Heap.

GC Time in all of the scenarios is around 2ms/Second, as shown in the
Cloudera Manager.

Reading most of the items from Disk in less data scenario gives better
results and very low latencies.

Number of regions per RS and HFile size does make a huge difference in my
Cluster.
Keeping 100 Regions per RS as max(Most of the discussions suggest this),
should I restrict the HFile size to 1GB? and thus reducing the storage
capacity (From 700 GB to 100GB per RS)?

Please advice.

Thanks,
Ramu


On Wed, Oct 9, 2013 at 4:58 AM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:

 What are your current heap and block cache sizes?

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Ramu M S [ramu.ma...@gmail.com]
 Sent: Monday, October 07, 2013 10:55 PM
 To: user@hbase.apache.org
 Subject: Re: HBase Random Read latency  100ms

 Hi All,

 Average Latency is still around 80ms.
 I have done the following,

 1. Enabled Snappy Compression
 2. Reduce the HFile size to 8 GB

 Should I attribute these results to bad Disk Configuration OR anything else
 to investigate?

 - Ramu


 On Tue, Oct 8, 2013 at 10:56 AM, Ramu M S ramu.ma...@gmail.com wrote:

  Vladimir,
 
  Thanks for the Insights into Future Caching features. Looks very
  interesting.
 
  - Ramu
 
 
  On Tue, Oct 8, 2013 at 10:45 AM, Vladimir Rodionov 
  vrodio...@carrieriq.com wrote:
 
  Ramu,
 
  If your working set of data fits

Re: HBase Random Read latency 100ms

2013-10-08 Thread Kiru Pakkirisamy
What is the iowait in both cases ?
 
Regards,
- kiru


Kiru Pakkirisamy | webcloudtech.wordpress.com



 From: Ramu M S ramu.ma...@gmail.com
To: user@hbase.apache.org 
Sent: Tuesday, October 8, 2013 6:00 PM
Subject: Re: HBase Random Read latency  100ms
 

Hi All,

After few suggestions from the mails earlier I changed the following,

1. Heap Size to 16 GB
2. Block Size to 16KB
3. HFile size to 8 GB (Table now has 256 regions, 32 per server)
4. Data Locality Index is 100 in all RS

I have clients running in 10 machines, each with 4 threads. So total 40.
This is same in all tests.

Result:
           1. Average latency is still 100ms.
           2. Heap occupancy is around 2-2.5 GB in all RS

Few more tests carried out yesterday,

TEST 1: Small data set (100 Million records, each with 724 bytes).
===
Configurations:
1. Heap Size to 1 GB
2. Block Size to 16KB
3. HFile size to 1 GB (Table now has 128 regions, 16 per server)
4. Data Locality Index is 100 in all RS

I disabled Block Cache on the table, to make sure I read everything from
disk, most of the time.

Result:
   1. Average Latency is 8ms and throughput went up to 6K/Sec per RS.
   2. With Block Cache enabled again, I got average latency around 2ms
and throughput of 10K/Sec per RS.
       Heap occupancy around 650 MB
   3. Increased the Heap to 16GB, with Block Cache still enabled, I got
average latency around 1 ms and throughput 20K/Sec per RS
       Heap Occupancy around 2-2.5 GB in all RS

TEST 2: Large Data set (1.8 Billion records, each with 724 bytes)
==
Configurations:
1. Heap Size to 1 GB
2. Block Size to 16KB
3. HFile size to 1 GB (Table now has 2048 regions, 256 per server)
4. Data Locality Index is 100 in all RS

Result:
  1. Average Latency is  500ms to start with and gradually decreases, but
even after around 100 Million reads it is still 100 ms
  2. Block Cache = TRUE/FALSE does not make any difference here. Even Heap
Size (1GB / 16GB) does not make any difference.
  3. Heap occupancy is around 2-2.5 GB under 16GB Heap and around 650 MB
under 1GB Heap.

GC Time in all of the scenarios is around 2ms/Second, as shown in the
Cloudera Manager.

Reading most of the items from Disk in less data scenario gives better
results and very low latencies.

Number of regions per RS and HFile size does make a huge difference in my
Cluster.
Keeping 100 Regions per RS as max(Most of the discussions suggest this),
should I restrict the HFile size to 1GB? and thus reducing the storage
capacity (From 700 GB to 100GB per RS)?

Please advice.

Thanks,
Ramu


On Wed, Oct 9, 2013 at 4:58 AM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:

 What are your current heap and block cache sizes?

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Ramu M S [ramu.ma...@gmail.com]
 Sent: Monday, October 07, 2013 10:55 PM
 To: user@hbase.apache.org
 Subject: Re: HBase Random Read latency  100ms

 Hi All,

 Average Latency is still around 80ms.
 I have done the following,

 1. Enabled Snappy Compression
 2. Reduce the HFile size to 8 GB

 Should I attribute these results to bad Disk Configuration OR anything else
 to investigate?

 - Ramu


 On Tue, Oct 8, 2013 at 10:56 AM, Ramu M S ramu.ma...@gmail.com wrote:

  Vladimir,
 
  Thanks for the Insights into Future Caching features. Looks very
  interesting.
 
  - Ramu
 
 
  On Tue, Oct 8, 2013 at 10:45 AM, Vladimir Rodionov 
  vrodio...@carrieriq.com wrote:
 
  Ramu,
 
  If your working set of data fits into 192GB you may get additional boost
  by utilizing OS page cache, or wait until
  0.98 release which introduces new bucket cache implementation (port of
  Facebook L2 cache). You can try vanilla bucket cache in 0.96 (not
 released
  yet
  but is due soon). Both caches stores data off-heap, but Facebook version
  can store encoded and compressed data and vanilla bucket cache does not.
  There are some options how to utilize efficiently available RAM (at
 least
  in upcoming HBase releases)
  . If your data set does not fit RAM then your only hope is your 24 SAS
  drives. Depending on your RAID settings, disk IO perf, HDFS
 configuration
  (I think the latest Hadoop is preferable here).
 
  OS page cache is most vulnerable and volatile, it can not be controlled
  and can be easily polluted by either some other processes or by HBase
  itself (long scan).
  With Block cache you have more control but the first truly usable
  *official* implementation is going to be a part of 0.98 release.
 
  As far as I understand, your use case would definitely covered by
  something similar to BigTable ScanCache (RowCache) , but there is no
 such
  cache in HBase yet.
  One major advantage of RowCache vs BlockCache (apart from being much
 more
  efficient in RAM usage) is resilience to Region

RE: HBase Random Read latency 100ms

2013-10-08 Thread Vladimir Rodionov
Upd.

Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO  (data + 
.crc) in a worst case. I think if Bloom Filter is enabled than
it is going to be 6 File IO in a worst case (large data set), therefore you 
will have not 5 IO requests in queue but up to 20-30 IO requests in a queue
This definitely explains  100ms avg latency.



Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Vladimir Rodionov
Sent: Tuesday, October 08, 2013 7:24 PM
To: user@hbase.apache.org
Subject: RE: HBase Random Read latency  100ms

Ramu,

You have 8 server boxes and 10 client. You have 40 requests in parallel - 5 per 
RS/DN?

You have 5 requests on random reads in a IO queue of your single RAID1. With 
avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add some 
overhead
of HDFS + HBase and you will probably have your issue explained ?

Your bottleneck is your disk system, I think. When you serve most of requests 
from disks as in your large data set scenario, make sure you have adequate disk 
sub-system and
that it is configured properly. Block Cache and OS page can not help you in 
this case as working data set is larger than both caches.

Good performance numbers in small data set scenario are explained by the fact 
that data fits into OS page cache and Block Cache - you do not read data from 
disk even if
you disable block cache.


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Ramu M S [ramu.ma...@gmail.com]
Sent: Tuesday, October 08, 2013 6:00 PM
To: user@hbase.apache.org
Subject: Re: HBase Random Read latency  100ms

Hi All,

After few suggestions from the mails earlier I changed the following,

1. Heap Size to 16 GB
2. Block Size to 16KB
3. HFile size to 8 GB (Table now has 256 regions, 32 per server)
4. Data Locality Index is 100 in all RS

I have clients running in 10 machines, each with 4 threads. So total 40.
This is same in all tests.

Result:
   1. Average latency is still 100ms.
   2. Heap occupancy is around 2-2.5 GB in all RS

Few more tests carried out yesterday,

TEST 1: Small data set (100 Million records, each with 724 bytes).
===
Configurations:
1. Heap Size to 1 GB
2. Block Size to 16KB
3. HFile size to 1 GB (Table now has 128 regions, 16 per server)
4. Data Locality Index is 100 in all RS

I disabled Block Cache on the table, to make sure I read everything from
disk, most of the time.

Result:
   1. Average Latency is 8ms and throughput went up to 6K/Sec per RS.
   2. With Block Cache enabled again, I got average latency around 2ms
and throughput of 10K/Sec per RS.
   Heap occupancy around 650 MB
   3. Increased the Heap to 16GB, with Block Cache still enabled, I got
average latency around 1 ms and throughput 20K/Sec per RS
   Heap Occupancy around 2-2.5 GB in all RS

TEST 2: Large Data set (1.8 Billion records, each with 724 bytes)
==
Configurations:
1. Heap Size to 1 GB
2. Block Size to 16KB
3. HFile size to 1 GB (Table now has 2048 regions, 256 per server)
4. Data Locality Index is 100 in all RS

Result:
  1. Average Latency is  500ms to start with and gradually decreases, but
even after around 100 Million reads it is still 100 ms
  2. Block Cache = TRUE/FALSE does not make any difference here. Even Heap
Size (1GB / 16GB) does not make any difference.
  3. Heap occupancy is around 2-2.5 GB under 16GB Heap and around 650 MB
under 1GB Heap.

GC Time in all of the scenarios is around 2ms/Second, as shown in the
Cloudera Manager.

Reading most of the items from Disk in less data scenario gives better
results and very low latencies.

Number of regions per RS and HFile size does make a huge difference in my
Cluster.
Keeping 100 Regions per RS as max(Most of the discussions suggest this),
should I restrict the HFile size to 1GB? and thus reducing the storage
capacity (From 700 GB to 100GB per RS)?

Please advice.

Thanks,
Ramu


On Wed, Oct 9, 2013 at 4:58 AM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:

 What are your current heap and block cache sizes?

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Ramu M S [ramu.ma...@gmail.com]
 Sent: Monday, October 07, 2013 10:55 PM
 To: user@hbase.apache.org
 Subject: Re: HBase Random Read latency  100ms

 Hi All,

 Average Latency is still around 80ms.
 I have done the following,

 1. Enabled Snappy Compression
 2. Reduce the HFile size to 8 GB

 Should I attribute these results to bad Disk Configuration OR anything else
 to investigate?

 - Ramu


 On Tue, Oct 8, 2013 at 10:56 AM, Ramu M S ramu.ma...@gmail.com wrote:

  Vladimir,
 
  Thanks

Re: HBase Random Read latency 100ms

2013-10-08 Thread lars hofhansl
Good call.
Could try to enable hbase.regionserver.checksum.verify, which will cause HBase 
to do its own checksums rather than relying on HDFS (and which saves 1 IO per 
block get).

I do think you can expect the index blocks to be cached at all times.

-- Lars

From: Vladimir Rodionov vrodio...@carrieriq.com
To: user@hbase.apache.org user@hbase.apache.org 
Sent: Tuesday, October 8, 2013 8:44 PM
Subject: RE: HBase Random Read latency  100ms


Upd.

Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO  (data + 
.crc) in a worst case. I think if Bloom Filter is enabled than
it is going to be 6 File IO in a worst case (large data set), therefore you 
will have not 5 IO requests in queue but up to 20-30 IO requests in a queue
This definitely explains  100ms avg latency.



Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com



From: Vladimir Rodionov
Sent: Tuesday, October 08, 2013 7:24 PM
To: user@hbase.apache.org
Subject: RE: HBase Random Read latency  100ms

Ramu,

You have 8 server boxes and 10 client. You have 40 requests in parallel - 5 per 
RS/DN?

You have 5 requests on random reads in a IO queue of your single RAID1. With 
avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add some 
overhead
of HDFS + HBase and you will probably have your issue explained ?

Your bottleneck is your disk system, I think. When you serve most of requests 
from disks as in your large data set scenario, make sure you have adequate disk 
sub-system and
that it is configured properly. Block Cache and OS page can not help you in 
this case as working data set is larger than both caches.

Good performance numbers in small data set scenario are explained by the fact 
that data fits into OS page cache and Block Cache - you do not read data from 
disk even if
you disable block cache.


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Ramu M S [ramu.ma...@gmail.com]
Sent: Tuesday, October 08, 2013 6:00 PM
To: user@hbase.apache.org
Subject: Re: HBase Random Read latency  100ms

Hi All,

After few suggestions from the mails earlier I changed the following,

1. Heap Size to 16 GB
2. Block Size to 16KB
3. HFile size to 8 GB (Table now has 256 regions, 32 per server)
4. Data Locality Index is 100 in all RS

I have clients running in 10 machines, each with 4 threads. So total 40.
This is same in all tests.

Result:
           1. Average latency is still 100ms.
           2. Heap occupancy is around 2-2.5 GB in all RS

Few more tests carried out yesterday,

TEST 1: Small data set (100 Million records, each with 724 bytes).
===
Configurations:
1. Heap Size to 1 GB
2. Block Size to 16KB
3. HFile size to 1 GB (Table now has 128 regions, 16 per server)
4. Data Locality Index is 100 in all RS

I disabled Block Cache on the table, to make sure I read everything from
disk, most of the time.

Result:
   1. Average Latency is 8ms and throughput went up to 6K/Sec per RS.
   2. With Block Cache enabled again, I got average latency around 2ms
and throughput of 10K/Sec per RS.
       Heap occupancy around 650 MB
   3. Increased the Heap to 16GB, with Block Cache still enabled, I got
average latency around 1 ms and throughput 20K/Sec per RS
       Heap Occupancy around 2-2.5 GB in all RS

TEST 2: Large Data set (1.8 Billion records, each with 724 bytes)
==
Configurations:
1. Heap Size to 1 GB
2. Block Size to 16KB
3. HFile size to 1 GB (Table now has 2048 regions, 256 per server)
4. Data Locality Index is 100 in all RS

Result:
  1. Average Latency is  500ms to start with and gradually decreases, but
even after around 100 Million reads it is still 100 ms
  2. Block Cache = TRUE/FALSE does not make any difference here. Even Heap
Size (1GB / 16GB) does not make any difference.
  3. Heap occupancy is around 2-2.5 GB under 16GB Heap and around 650 MB
under 1GB Heap.

GC Time in all of the scenarios is around 2ms/Second, as shown in the
Cloudera Manager.

Reading most of the items from Disk in less data scenario gives better
results and very low latencies.

Number of regions per RS and HFile size does make a huge difference in my
Cluster.
Keeping 100 Regions per RS as max(Most of the discussions suggest this),
should I restrict the HFile size to 1GB? and thus reducing the storage
capacity (From 700 GB to 100GB per RS)?

Please advice.

Thanks,
Ramu


On Wed, Oct 9, 2013 at 4:58 AM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:

 What are your current heap and block cache sizes?

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Ramu M S [ramu.ma

Re: HBase Random Read latency 100ms

2013-10-07 Thread Ramu M S
Lars,

In one of your old posts, you had mentioned that lowering the BLOCKSIZE is
good for random reads (of course with increased size for Block Indexes).

Post is at http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow

Will that help in my tests? Should I give it a try? If I alter my table,
should I trigger a major compaction again for this to take effect?

Thanks,
Ramu


On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com wrote:

 Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB.

 {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING =
 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1',
 COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
 KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false',
 ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}

 Thanks,
 Ramu


 On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com wrote:

 Lars,

 - Yes Short Circuit reading is enabled on both HDFS and HBase.
 - I had issued Major compaction after table is loaded.
 - Region Servers have max heap set as 128 GB. Block Cache Size is 0.25 of
 heap (So 32 GB for each Region Server) Do we need even more?
 - Decreasing HFile Size (Default is 1GB )? Should I leave it to default?
 - Keys are Zipfian distributed (By YCSB)

 Bharath,

 Bloom Filters are enabled. Here is my table details,
 {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING =
 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1',
 COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
 KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '16384', IN_MEMORY = 'false',
 ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}

 When the data size is around 100GB (100 Million records), then the
 latency is very good. I am getting a throughput of around 300K OPS.
 In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk reads are
 around 50-60 MB/s throughout the read cycle.

 Thanks,
 Ramu


 On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl la...@apache.org wrote:

 Have you enabled short circuit reading? See here:
 http://hbase.apache.org/book/perf.hdfs.html

 How's your data locality (shown on the RegionServer UI page).


 How much memory are you giving your RegionServers?
 If you reads are truly random and the data set does not fit into the
 aggregate cache, you'll be dominated by the disk and network.
 Each read would need to bring in a 64k (default) HFile block. If short
 circuit reading is not enabled you'll get two or three context switches.

 So I would try:
 1. Enable short circuit reading
 2. Increase the block cache size per RegionServer
 3. Decrease the HFile block size
 4. Make sure your data is local (if it is not, issue a major compaction).


 -- Lars



 
  From: Ramu M S ramu.ma...@gmail.com
 To: user@hbase.apache.org
 Sent: Sunday, October 6, 2013 10:01 PM
 Subject: HBase Random Read latency  100ms


 Hi All,

 My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6).

 Each Region Server is with the following configuration,
 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk
 (Unfortunately configured with RAID 1, can't change this as the Machines
 are leased temporarily for a month).

 I am running YCSB benchmark tests on HBase and currently inserting around
 1.8 Billion records.
 (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record)

 Currently I am getting a write throughput of around 100K OPS, but random
 reads are very very slow, all gets have more than 100ms or more latency.

 I have changed the following default configuration,
 1. HFile Size: 16GB
 2. HDFS Block Size: 512 MB

 Total Data size is around 1.8 TB (Excluding the replicas).
 My Table is split into 128 Regions (No pre-splitting used, started with 1
 and grew to 128 over the insertion time)

 Taking some inputs from earlier discussions I have done the following
 changes to disable Nagle (In both Client and Server hbase-site.xml,
 hdfs-site.xml)

 property
   namehbase.ipc.client.tcpnodelay/name
   valuetrue/value
 /property

 property
   nameipc.server.tcpnodelay/name
   valuetrue/value
 /property

 Ganglia stats shows large CPU IO wait (30% during reads).

 I agree that disk configuration is not ideal for Hadoop cluster, but as
 told earlier it can't change for now.
 I feel the latency is way beyond any reported results so far.

 Any pointers on what can be wrong?

 Thanks,
 Ramu






Re: HBase Random Read latency 100ms

2013-10-07 Thread lars hofhansl
First of: 128gb heap per RegionServer. Wow.I'd be interested to hear your 
experience with such a large heap for your RS. It's definitely big enough.


It's interesting hat 100gb do fit into the aggregate cache (of 8x32gb), while 
1.8tb do not.
Looks like ~70% of the read request would need to bring in a 64kb block in 
order to read 724 bytes.

Should that take 100ms? No. Something's still amiss.

Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k to read 
the small row). You would need to issue a major compaction for that to take 
effect.
Maybe try 16k blocks. If that speeds up your random gets we know where to look 
next... At the disk IO.


-- Lars




 From: Ramu M S ramu.ma...@gmail.com
To: user@hbase.apache.org; lars hofhansl la...@apache.org 
Sent: Sunday, October 6, 2013 11:05 PM
Subject: Re: HBase Random Read latency  100ms
 

Lars,

In one of your old posts, you had mentioned that lowering the BLOCKSIZE is
good for random reads (of course with increased size for Block Indexes).

Post is at http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow

Will that help in my tests? Should I give it a try? If I alter my table,
should I trigger a major compaction again for this to take effect?

Thanks,
Ramu



On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com wrote:

 Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB.

 {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING =
 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1',
 COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
 KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false',
 ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}

 Thanks,
 Ramu


 On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com wrote:

 Lars,

 - Yes Short Circuit reading is enabled on both HDFS and HBase.
 - I had issued Major compaction after table is loaded.
 - Region Servers have max heap set as 128 GB. Block Cache Size is 0.25 of
 heap (So 32 GB for each Region Server) Do we need even more?
 - Decreasing HFile Size (Default is 1GB )? Should I leave it to default?
 - Keys are Zipfian distributed (By YCSB)

 Bharath,

 Bloom Filters are enabled. Here is my table details,
 {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING =
 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1',
 COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
 KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '16384', IN_MEMORY = 'false',
 ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}

 When the data size is around 100GB (100 Million records), then the
 latency is very good. I am getting a throughput of around 300K OPS.
 In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk reads are
 around 50-60 MB/s throughout the read cycle.

 Thanks,
 Ramu


 On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl la...@apache.org wrote:

 Have you enabled short circuit reading? See here:
 http://hbase.apache.org/book/perf.hdfs.html

 How's your data locality (shown on the RegionServer UI page).


 How much memory are you giving your RegionServers?
 If you reads are truly random and the data set does not fit into the
 aggregate cache, you'll be dominated by the disk and network.
 Each read would need to bring in a 64k (default) HFile block. If short
 circuit reading is not enabled you'll get two or three context switches.

 So I would try:
 1. Enable short circuit reading
 2. Increase the block cache size per RegionServer
 3. Decrease the HFile block size
 4. Make sure your data is local (if it is not, issue a major compaction).


 -- Lars



 
  From: Ramu M S ramu.ma...@gmail.com
 To: user@hbase.apache.org
 Sent: Sunday, October 6, 2013 10:01 PM
 Subject: HBase Random Read latency  100ms


 Hi All,

 My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6).

 Each Region Server is with the following configuration,
 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk
 (Unfortunately configured with RAID 1, can't change this as the Machines
 are leased temporarily for a month).

 I am running YCSB benchmark tests on HBase and currently inserting around
 1.8 Billion records.
 (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record)

 Currently I am getting a write throughput of around 100K OPS, but random
 reads are very very slow, all gets have more than 100ms or more latency.

 I have changed the following default configuration,
 1. HFile Size: 16GB
 2. HDFS Block Size: 512 MB

 Total Data size is around 1.8 TB (Excluding the replicas).
 My Table is split into 128 Regions (No pre-splitting used, started with 1
 and grew to 128 over the insertion time)

 Taking some inputs from earlier discussions I have done the following
 changes to disable Nagle (In both Client and Server hbase-site.xml,
 hdfs-site.xml)

 property
   namehbase.ipc.client.tcpnodelay/name
   valuetrue/value
 /property

Re: HBase Random Read latency 100ms

2013-10-07 Thread Ramu M S
Thanks Lars.

I have changed the BLOCKSIZE to 16KB and triggered a major compaction. I
will report my results once it is done.

- Ramu


On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org wrote:

 First of: 128gb heap per RegionServer. Wow.I'd be interested to hear your
 experience with such a large heap for your RS. It's definitely big enough.


 It's interesting hat 100gb do fit into the aggregate cache (of 8x32gb),
 while 1.8tb do not.
 Looks like ~70% of the read request would need to bring in a 64kb block in
 order to read 724 bytes.

 Should that take 100ms? No. Something's still amiss.

 Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k to
 read the small row). You would need to issue a major compaction for that to
 take effect.
 Maybe try 16k blocks. If that speeds up your random gets we know where to
 look next... At the disk IO.


 -- Lars



 
  From: Ramu M S ramu.ma...@gmail.com
 To: user@hbase.apache.org; lars hofhansl la...@apache.org
 Sent: Sunday, October 6, 2013 11:05 PM
 Subject: Re: HBase Random Read latency  100ms


 Lars,

 In one of your old posts, you had mentioned that lowering the BLOCKSIZE is
 good for random reads (of course with increased size for Block Indexes).

 Post is at http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow

 Will that help in my tests? Should I give it a try? If I alter my table,
 should I trigger a major compaction again for this to take effect?

 Thanks,
 Ramu



 On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com wrote:

  Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB.
 
  {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING =
  'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS =
 '1',
  COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
  KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY =
 'false',
  ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}
 
  Thanks,
  Ramu
 
 
  On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com wrote:
 
  Lars,
 
  - Yes Short Circuit reading is enabled on both HDFS and HBase.
  - I had issued Major compaction after table is loaded.
  - Region Servers have max heap set as 128 GB. Block Cache Size is 0.25
 of
  heap (So 32 GB for each Region Server) Do we need even more?
  - Decreasing HFile Size (Default is 1GB )? Should I leave it to default?
  - Keys are Zipfian distributed (By YCSB)
 
  Bharath,
 
  Bloom Filters are enabled. Here is my table details,
  {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING =
  'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS =
 '1',
  COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
  KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '16384', IN_MEMORY =
 'false',
  ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}
 
  When the data size is around 100GB (100 Million records), then the
  latency is very good. I am getting a throughput of around 300K OPS.
  In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk reads are
  around 50-60 MB/s throughout the read cycle.
 
  Thanks,
  Ramu
 
 
  On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl la...@apache.org wrote:
 
  Have you enabled short circuit reading? See here:
  http://hbase.apache.org/book/perf.hdfs.html
 
  How's your data locality (shown on the RegionServer UI page).
 
 
  How much memory are you giving your RegionServers?
  If you reads are truly random and the data set does not fit into the
  aggregate cache, you'll be dominated by the disk and network.
  Each read would need to bring in a 64k (default) HFile block. If short
  circuit reading is not enabled you'll get two or three context
 switches.
 
  So I would try:
  1. Enable short circuit reading
  2. Increase the block cache size per RegionServer
  3. Decrease the HFile block size
  4. Make sure your data is local (if it is not, issue a major
 compaction).
 
 
  -- Lars
 
 
 
  
   From: Ramu M S ramu.ma...@gmail.com
  To: user@hbase.apache.org
  Sent: Sunday, October 6, 2013 10:01 PM
  Subject: HBase Random Read latency  100ms
 
 
  Hi All,
 
  My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6).
 
  Each Region Server is with the following configuration,
  16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk
  (Unfortunately configured with RAID 1, can't change this as the
 Machines
  are leased temporarily for a month).
 
  I am running YCSB benchmark tests on HBase and currently inserting
 around
  1.8 Billion records.
  (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record)
 
  Currently I am getting a write throughput of around 100K OPS, but
 random
  reads are very very slow, all gets have more than 100ms or more
 latency.
 
  I have changed the following default configuration,
  1. HFile Size: 16GB
  2. HDFS Block Size: 512 MB
 
  Total Data size is around 1.8 TB (Excluding the replicas).
  My Table is split

Re: HBase Random Read latency 100ms

2013-10-07 Thread Ramu M S
Lars,

After changing the BLOCKSIZE to 16KB, the latency has reduced a little. Now
the average is around 75ms.
Overall throughput (I am using 40 Clients to fetch records) is around 1K
OPS.

After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97 in my 8
RS respectively.

Thanks,
Ramu


On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com wrote:

 Thanks Lars.

 I have changed the BLOCKSIZE to 16KB and triggered a major compaction. I
 will report my results once it is done.

 - Ramu


 On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org wrote:

 First of: 128gb heap per RegionServer. Wow.I'd be interested to hear your
 experience with such a large heap for your RS. It's definitely big enough.


 It's interesting hat 100gb do fit into the aggregate cache (of 8x32gb),
 while 1.8tb do not.
 Looks like ~70% of the read request would need to bring in a 64kb block
 in order to read 724 bytes.

 Should that take 100ms? No. Something's still amiss.

 Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k to
 read the small row). You would need to issue a major compaction for that to
 take effect.
 Maybe try 16k blocks. If that speeds up your random gets we know where to
 look next... At the disk IO.


 -- Lars



 
  From: Ramu M S ramu.ma...@gmail.com
 To: user@hbase.apache.org; lars hofhansl la...@apache.org
 Sent: Sunday, October 6, 2013 11:05 PM
 Subject: Re: HBase Random Read latency  100ms


 Lars,

 In one of your old posts, you had mentioned that lowering the BLOCKSIZE is
 good for random reads (of course with increased size for Block Indexes).

 Post is at http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow

 Will that help in my tests? Should I give it a try? If I alter my table,
 should I trigger a major compaction again for this to take effect?

 Thanks,
 Ramu



 On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com wrote:

  Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB.
 
  {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING =
  'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS =
 '1',
  COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
  KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY =
 'false',
  ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}
 
  Thanks,
  Ramu
 
 
  On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com wrote:
 
  Lars,
 
  - Yes Short Circuit reading is enabled on both HDFS and HBase.
  - I had issued Major compaction after table is loaded.
  - Region Servers have max heap set as 128 GB. Block Cache Size is 0.25
 of
  heap (So 32 GB for each Region Server) Do we need even more?
  - Decreasing HFile Size (Default is 1GB )? Should I leave it to
 default?
  - Keys are Zipfian distributed (By YCSB)
 
  Bharath,
 
  Bloom Filters are enabled. Here is my table details,
  {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING
 =
  'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS =
 '1',
  COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
  KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '16384', IN_MEMORY =
 'false',
  ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}
 
  When the data size is around 100GB (100 Million records), then the
  latency is very good. I am getting a throughput of around 300K OPS.
  In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk reads
 are
  around 50-60 MB/s throughout the read cycle.
 
  Thanks,
  Ramu
 
 
  On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl la...@apache.org
 wrote:
 
  Have you enabled short circuit reading? See here:
  http://hbase.apache.org/book/perf.hdfs.html
 
  How's your data locality (shown on the RegionServer UI page).
 
 
  How much memory are you giving your RegionServers?
  If you reads are truly random and the data set does not fit into the
  aggregate cache, you'll be dominated by the disk and network.
  Each read would need to bring in a 64k (default) HFile block. If short
  circuit reading is not enabled you'll get two or three context
 switches.
 
  So I would try:
  1. Enable short circuit reading
  2. Increase the block cache size per RegionServer
  3. Decrease the HFile block size
  4. Make sure your data is local (if it is not, issue a major
 compaction).
 
 
  -- Lars
 
 
 
  
   From: Ramu M S ramu.ma...@gmail.com
  To: user@hbase.apache.org
  Sent: Sunday, October 6, 2013 10:01 PM
  Subject: HBase Random Read latency  100ms
 
 
  Hi All,
 
  My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6).
 
  Each Region Server is with the following configuration,
  16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk
  (Unfortunately configured with RAID 1, can't change this as the
 Machines
  are leased temporarily for a month).
 
  I am running YCSB benchmark tests on HBase and currently inserting
 around
  1.8 Billion records.
  (1 Key + 7 Fields of 100

Re: HBase Random Read latency 100ms

2013-10-07 Thread Bharath Vissapragada
Hi Ramu,

Thanks for reporting the results back. Just curious if you are hitting any
big GC pauses due to block cache churn on such large heap. Do you see it ?

- Bharath


On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com wrote:

 Lars,

 After changing the BLOCKSIZE to 16KB, the latency has reduced a little. Now
 the average is around 75ms.
 Overall throughput (I am using 40 Clients to fetch records) is around 1K
 OPS.

 After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97 in my 8
 RS respectively.

 Thanks,
 Ramu


 On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com wrote:

  Thanks Lars.
 
  I have changed the BLOCKSIZE to 16KB and triggered a major compaction. I
  will report my results once it is done.
 
  - Ramu
 
 
  On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org wrote:
 
  First of: 128gb heap per RegionServer. Wow.I'd be interested to hear
 your
  experience with such a large heap for your RS. It's definitely big
 enough.
 
 
  It's interesting hat 100gb do fit into the aggregate cache (of 8x32gb),
  while 1.8tb do not.
  Looks like ~70% of the read request would need to bring in a 64kb block
  in order to read 724 bytes.
 
  Should that take 100ms? No. Something's still amiss.
 
  Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k to
  read the small row). You would need to issue a major compaction for
 that to
  take effect.
  Maybe try 16k blocks. If that speeds up your random gets we know where
 to
  look next... At the disk IO.
 
 
  -- Lars
 
 
 
  
   From: Ramu M S ramu.ma...@gmail.com
  To: user@hbase.apache.org; lars hofhansl la...@apache.org
  Sent: Sunday, October 6, 2013 11:05 PM
  Subject: Re: HBase Random Read latency  100ms
 
 
  Lars,
 
  In one of your old posts, you had mentioned that lowering the BLOCKSIZE
 is
  good for random reads (of course with increased size for Block Indexes).
 
  Post is at
 http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow
 
  Will that help in my tests? Should I give it a try? If I alter my table,
  should I trigger a major compaction again for this to take effect?
 
  Thanks,
  Ramu
 
 
 
  On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com wrote:
 
   Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB.
  
   {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING
 =
   'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS =
  '1',
   COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
   KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY =
  'false',
   ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}
  
   Thanks,
   Ramu
  
  
   On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com
 wrote:
  
   Lars,
  
   - Yes Short Circuit reading is enabled on both HDFS and HBase.
   - I had issued Major compaction after table is loaded.
   - Region Servers have max heap set as 128 GB. Block Cache Size is
 0.25
  of
   heap (So 32 GB for each Region Server) Do we need even more?
   - Decreasing HFile Size (Default is 1GB )? Should I leave it to
  default?
   - Keys are Zipfian distributed (By YCSB)
  
   Bharath,
  
   Bloom Filters are enabled. Here is my table details,
   {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING
  =
   'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS
 =
  '1',
   COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
   KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '16384', IN_MEMORY =
  'false',
   ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}
  
   When the data size is around 100GB (100 Million records), then the
   latency is very good. I am getting a throughput of around 300K OPS.
   In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk reads
  are
   around 50-60 MB/s throughout the read cycle.
  
   Thanks,
   Ramu
  
  
   On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl la...@apache.org
  wrote:
  
   Have you enabled short circuit reading? See here:
   http://hbase.apache.org/book/perf.hdfs.html
  
   How's your data locality (shown on the RegionServer UI page).
  
  
   How much memory are you giving your RegionServers?
   If you reads are truly random and the data set does not fit into the
   aggregate cache, you'll be dominated by the disk and network.
   Each read would need to bring in a 64k (default) HFile block. If
 short
   circuit reading is not enabled you'll get two or three context
  switches.
  
   So I would try:
   1. Enable short circuit reading
   2. Increase the block cache size per RegionServer
   3. Decrease the HFile block size
   4. Make sure your data is local (if it is not, issue a major
  compaction).
  
  
   -- Lars
  
  
  
   
From: Ramu M S ramu.ma...@gmail.com
   To: user@hbase.apache.org
   Sent: Sunday, October 6, 2013 10:01 PM
   Subject: HBase Random Read latency  100ms
  
  
   Hi All,
  
   My HBase cluster has

Re: HBase Random Read latency 100ms

2013-10-07 Thread Ramu M S
Bharath,

I was about to report this. Yes indeed there is too much of GC time.
Just verified the GC time using Cloudera Manager statistics(Every minute
update).

For each Region Server,
 - During Read: Graph shows 2s constant.
 - During Compaction: Graph starts with 7s and goes as high as 20s during
end.

Few more questions,
1. For the current evaluation, since the reads are completely random and I
don't expect to read same data again can I set the Heap to the default 1 GB
?

2. Can I completely turn off BLOCK CACHE for this table?
http://hbase.apache.org/book/regionserver.arch.html recommends that for
Randm reads.

3. But in the next phase of evaluation, We are interested to use HBase as
In-memory KV DB by having the latest data in RAM (To the tune of around 128
GB in each RS, we are setting up 50-100 Node Cluster). I am very curious to
hear any suggestions in this regard.

Regards,
Ramu


On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada
bhara...@cloudera.comwrote:

 Hi Ramu,

 Thanks for reporting the results back. Just curious if you are hitting any
 big GC pauses due to block cache churn on such large heap. Do you see it ?

 - Bharath


 On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com wrote:

  Lars,
 
  After changing the BLOCKSIZE to 16KB, the latency has reduced a little.
 Now
  the average is around 75ms.
  Overall throughput (I am using 40 Clients to fetch records) is around 1K
  OPS.
 
  After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97 in
 my 8
  RS respectively.
 
  Thanks,
  Ramu
 
 
  On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com wrote:
 
   Thanks Lars.
  
   I have changed the BLOCKSIZE to 16KB and triggered a major compaction.
 I
   will report my results once it is done.
  
   - Ramu
  
  
   On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org
 wrote:
  
   First of: 128gb heap per RegionServer. Wow.I'd be interested to hear
  your
   experience with such a large heap for your RS. It's definitely big
  enough.
  
  
   It's interesting hat 100gb do fit into the aggregate cache (of
 8x32gb),
   while 1.8tb do not.
   Looks like ~70% of the read request would need to bring in a 64kb
 block
   in order to read 724 bytes.
  
   Should that take 100ms? No. Something's still amiss.
  
   Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k
 to
   read the small row). You would need to issue a major compaction for
  that to
   take effect.
   Maybe try 16k blocks. If that speeds up your random gets we know where
  to
   look next... At the disk IO.
  
  
   -- Lars
  
  
  
   
From: Ramu M S ramu.ma...@gmail.com
   To: user@hbase.apache.org; lars hofhansl la...@apache.org
   Sent: Sunday, October 6, 2013 11:05 PM
   Subject: Re: HBase Random Read latency  100ms
  
  
   Lars,
  
   In one of your old posts, you had mentioned that lowering the
 BLOCKSIZE
  is
   good for random reads (of course with increased size for Block
 Indexes).
  
   Post is at
  http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow
  
   Will that help in my tests? Should I give it a try? If I alter my
 table,
   should I trigger a major compaction again for this to take effect?
  
   Thanks,
   Ramu
  
  
  
   On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com
 wrote:
  
Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64
 KB.
   
{NAME = 'usertable', FAMILIES = [{NAME = 'cf',
 DATA_BLOCK_ENCODING
  =
'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS
 =
   '1',
COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY =
   'false',
ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}
   
Thanks,
Ramu
   
   
On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com
  wrote:
   
Lars,
   
- Yes Short Circuit reading is enabled on both HDFS and HBase.
- I had issued Major compaction after table is loaded.
- Region Servers have max heap set as 128 GB. Block Cache Size is
  0.25
   of
heap (So 32 GB for each Region Server) Do we need even more?
- Decreasing HFile Size (Default is 1GB )? Should I leave it to
   default?
- Keys are Zipfian distributed (By YCSB)
   
Bharath,
   
Bloom Filters are enabled. Here is my table details,
{NAME = 'usertable', FAMILIES = [{NAME = 'cf',
 DATA_BLOCK_ENCODING
   =
'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS
  =
   '1',
COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '16384', IN_MEMORY =
   'false',
ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}
   
When the data size is around 100GB (100 Million records), then the
latency is very good. I am getting a throughput of around 300K OPS.
In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk
 reads
   are
around 50-60 MB/s

Re: HBase Random Read latency 100ms

2013-10-07 Thread Ramu M S
Lars, Bharath,

Compression is disabled for the table. This was not intended from the
evaluation.
I forgot to mention that during table creation. I will enable snappy and do
major compaction again.

Please suggest other options to try out and also suggestions for the
previous questions.

Thanks,
Ramu


On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com wrote:

 Bharath,

 I was about to report this. Yes indeed there is too much of GC time.
 Just verified the GC time using Cloudera Manager statistics(Every minute
 update).

 For each Region Server,
  - During Read: Graph shows 2s constant.
  - During Compaction: Graph starts with 7s and goes as high as 20s during
 end.

 Few more questions,
 1. For the current evaluation, since the reads are completely random and I
 don't expect to read same data again can I set the Heap to the default 1 GB
 ?

 2. Can I completely turn off BLOCK CACHE for this table?
 http://hbase.apache.org/book/regionserver.arch.html recommends that
 for Randm reads.

 3. But in the next phase of evaluation, We are interested to use HBase as
 In-memory KV DB by having the latest data in RAM (To the tune of around 128
 GB in each RS, we are setting up 50-100 Node Cluster). I am very curious to
 hear any suggestions in this regard.

 Regards,
 Ramu


 On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada 
 bhara...@cloudera.com wrote:

 Hi Ramu,

 Thanks for reporting the results back. Just curious if you are hitting any
 big GC pauses due to block cache churn on such large heap. Do you see it ?

 - Bharath


 On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com wrote:

  Lars,
 
  After changing the BLOCKSIZE to 16KB, the latency has reduced a little.
 Now
  the average is around 75ms.
  Overall throughput (I am using 40 Clients to fetch records) is around 1K
  OPS.
 
  After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97 in
 my 8
  RS respectively.
 
  Thanks,
  Ramu
 
 
  On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com wrote:
 
   Thanks Lars.
  
   I have changed the BLOCKSIZE to 16KB and triggered a major
 compaction. I
   will report my results once it is done.
  
   - Ramu
  
  
   On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org
 wrote:
  
   First of: 128gb heap per RegionServer. Wow.I'd be interested to hear
  your
   experience with such a large heap for your RS. It's definitely big
  enough.
  
  
   It's interesting hat 100gb do fit into the aggregate cache (of
 8x32gb),
   while 1.8tb do not.
   Looks like ~70% of the read request would need to bring in a 64kb
 block
   in order to read 724 bytes.
  
   Should that take 100ms? No. Something's still amiss.
  
   Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k
 to
   read the small row). You would need to issue a major compaction for
  that to
   take effect.
   Maybe try 16k blocks. If that speeds up your random gets we know
 where
  to
   look next... At the disk IO.
  
  
   -- Lars
  
  
  
   
From: Ramu M S ramu.ma...@gmail.com
   To: user@hbase.apache.org; lars hofhansl la...@apache.org
   Sent: Sunday, October 6, 2013 11:05 PM
   Subject: Re: HBase Random Read latency  100ms
  
  
   Lars,
  
   In one of your old posts, you had mentioned that lowering the
 BLOCKSIZE
  is
   good for random reads (of course with increased size for Block
 Indexes).
  
   Post is at
  http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow
  
   Will that help in my tests? Should I give it a try? If I alter my
 table,
   should I trigger a major compaction again for this to take effect?
  
   Thanks,
   Ramu
  
  
  
   On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com
 wrote:
  
Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64
 KB.
   
{NAME = 'usertable', FAMILIES = [{NAME = 'cf',
 DATA_BLOCK_ENCODING
  =
'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0',
 VERSIONS =
   '1',
COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY =
   'false',
ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}
   
Thanks,
Ramu
   
   
On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com
  wrote:
   
Lars,
   
- Yes Short Circuit reading is enabled on both HDFS and HBase.
- I had issued Major compaction after table is loaded.
- Region Servers have max heap set as 128 GB. Block Cache Size is
  0.25
   of
heap (So 32 GB for each Region Server) Do we need even more?
- Decreasing HFile Size (Default is 1GB )? Should I leave it to
   default?
- Keys are Zipfian distributed (By YCSB)
   
Bharath,
   
Bloom Filters are enabled. Here is my table details,
{NAME = 'usertable', FAMILIES = [{NAME = 'cf',
 DATA_BLOCK_ENCODING
   =
'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0',
 VERSIONS
  =
   '1',
COMPRESSION = 'NONE', MIN_VERSIONS = '0

Re: HBase Random Read latency 100ms

2013-10-07 Thread Jean-Marc Spaggiari
Quick questionon the disk side.

When you say:
800 GB SATA (7200 RPM) Disk
Is it 1x800GB? It's raid 1, so might be 2 drives? What's the configuration?

JM


2013/10/7 Ramu M S ramu.ma...@gmail.com

 Lars, Bharath,

 Compression is disabled for the table. This was not intended from the
 evaluation.
 I forgot to mention that during table creation. I will enable snappy and do
 major compaction again.

 Please suggest other options to try out and also suggestions for the
 previous questions.

 Thanks,
 Ramu


 On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com wrote:

  Bharath,
 
  I was about to report this. Yes indeed there is too much of GC time.
  Just verified the GC time using Cloudera Manager statistics(Every minute
  update).
 
  For each Region Server,
   - During Read: Graph shows 2s constant.
   - During Compaction: Graph starts with 7s and goes as high as 20s during
  end.
 
  Few more questions,
  1. For the current evaluation, since the reads are completely random and
 I
  don't expect to read same data again can I set the Heap to the default 1
 GB
  ?
 
  2. Can I completely turn off BLOCK CACHE for this table?
  http://hbase.apache.org/book/regionserver.arch.html recommends that
  for Randm reads.
 
  3. But in the next phase of evaluation, We are interested to use HBase as
  In-memory KV DB by having the latest data in RAM (To the tune of around
 128
  GB in each RS, we are setting up 50-100 Node Cluster). I am very curious
 to
  hear any suggestions in this regard.
 
  Regards,
  Ramu
 
 
  On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada 
  bhara...@cloudera.com wrote:
 
  Hi Ramu,
 
  Thanks for reporting the results back. Just curious if you are hitting
 any
  big GC pauses due to block cache churn on such large heap. Do you see
 it ?
 
  - Bharath
 
 
  On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com wrote:
 
   Lars,
  
   After changing the BLOCKSIZE to 16KB, the latency has reduced a
 little.
  Now
   the average is around 75ms.
   Overall throughput (I am using 40 Clients to fetch records) is around
 1K
   OPS.
  
   After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97 in
  my 8
   RS respectively.
  
   Thanks,
   Ramu
  
  
   On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com
 wrote:
  
Thanks Lars.
   
I have changed the BLOCKSIZE to 16KB and triggered a major
  compaction. I
will report my results once it is done.
   
- Ramu
   
   
On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org
  wrote:
   
First of: 128gb heap per RegionServer. Wow.I'd be interested to
 hear
   your
experience with such a large heap for your RS. It's definitely big
   enough.
   
   
It's interesting hat 100gb do fit into the aggregate cache (of
  8x32gb),
while 1.8tb do not.
Looks like ~70% of the read request would need to bring in a 64kb
  block
in order to read 724 bytes.
   
Should that take 100ms? No. Something's still amiss.
   
Smaller blocks might help (you'd need to bring in 4, 8, or maybe
 16k
  to
read the small row). You would need to issue a major compaction for
   that to
take effect.
Maybe try 16k blocks. If that speeds up your random gets we know
  where
   to
look next... At the disk IO.
   
   
-- Lars
   
   
   

 From: Ramu M S ramu.ma...@gmail.com
To: user@hbase.apache.org; lars hofhansl la...@apache.org
Sent: Sunday, October 6, 2013 11:05 PM
Subject: Re: HBase Random Read latency  100ms
   
   
Lars,
   
In one of your old posts, you had mentioned that lowering the
  BLOCKSIZE
   is
good for random reads (of course with increased size for Block
  Indexes).
   
Post is at
   http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow
   
Will that help in my tests? Should I give it a try? If I alter my
  table,
should I trigger a major compaction again for this to take effect?
   
Thanks,
Ramu
   
   
   
On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com
  wrote:
   
 Sorry BLOCKSIZE was wrong in my earlier post, it is the default
 64
  KB.

 {NAME = 'usertable', FAMILIES = [{NAME = 'cf',
  DATA_BLOCK_ENCODING
   =
 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0',
  VERSIONS =
'1',
 COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
 KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY =
'false',
 ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}

 Thanks,
 Ramu


 On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com
   wrote:

 Lars,

 - Yes Short Circuit reading is enabled on both HDFS and HBase.
 - I had issued Major compaction after table is loaded.
 - Region Servers have max heap set as 128 GB. Block Cache Size
 is
   0.25
of
 heap (So 32 GB for each Region Server) Do we need even more?
 - Decreasing HFile

Re: HBase Random Read latency 100ms

2013-10-07 Thread Ramu M S
Jean,

Yes. It is 2 drives.

- Ramu


On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org
 wrote:

 Quick questionon the disk side.

 When you say:
 800 GB SATA (7200 RPM) Disk
 Is it 1x800GB? It's raid 1, so might be 2 drives? What's the configuration?

 JM


 2013/10/7 Ramu M S ramu.ma...@gmail.com

  Lars, Bharath,
 
  Compression is disabled for the table. This was not intended from the
  evaluation.
  I forgot to mention that during table creation. I will enable snappy and
 do
  major compaction again.
 
  Please suggest other options to try out and also suggestions for the
  previous questions.
 
  Thanks,
  Ramu
 
 
  On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com wrote:
 
   Bharath,
  
   I was about to report this. Yes indeed there is too much of GC time.
   Just verified the GC time using Cloudera Manager statistics(Every
 minute
   update).
  
   For each Region Server,
- During Read: Graph shows 2s constant.
- During Compaction: Graph starts with 7s and goes as high as 20s
 during
   end.
  
   Few more questions,
   1. For the current evaluation, since the reads are completely random
 and
  I
   don't expect to read same data again can I set the Heap to the default
 1
  GB
   ?
  
   2. Can I completely turn off BLOCK CACHE for this table?
   http://hbase.apache.org/book/regionserver.arch.html recommends
 that
   for Randm reads.
  
   3. But in the next phase of evaluation, We are interested to use HBase
 as
   In-memory KV DB by having the latest data in RAM (To the tune of around
  128
   GB in each RS, we are setting up 50-100 Node Cluster). I am very
 curious
  to
   hear any suggestions in this regard.
  
   Regards,
   Ramu
  
  
   On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada 
   bhara...@cloudera.com wrote:
  
   Hi Ramu,
  
   Thanks for reporting the results back. Just curious if you are hitting
  any
   big GC pauses due to block cache churn on such large heap. Do you see
  it ?
  
   - Bharath
  
  
   On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com
 wrote:
  
Lars,
   
After changing the BLOCKSIZE to 16KB, the latency has reduced a
  little.
   Now
the average is around 75ms.
Overall throughput (I am using 40 Clients to fetch records) is
 around
  1K
OPS.
   
After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97
 in
   my 8
RS respectively.
   
Thanks,
Ramu
   
   
On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com
  wrote:
   
 Thanks Lars.

 I have changed the BLOCKSIZE to 16KB and triggered a major
   compaction. I
 will report my results once it is done.

 - Ramu


 On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org
   wrote:

 First of: 128gb heap per RegionServer. Wow.I'd be interested to
  hear
your
 experience with such a large heap for your RS. It's definitely
 big
enough.


 It's interesting hat 100gb do fit into the aggregate cache (of
   8x32gb),
 while 1.8tb do not.
 Looks like ~70% of the read request would need to bring in a 64kb
   block
 in order to read 724 bytes.

 Should that take 100ms? No. Something's still amiss.

 Smaller blocks might help (you'd need to bring in 4, 8, or maybe
  16k
   to
 read the small row). You would need to issue a major compaction
 for
that to
 take effect.
 Maybe try 16k blocks. If that speeds up your random gets we know
   where
to
 look next... At the disk IO.


 -- Lars



 
  From: Ramu M S ramu.ma...@gmail.com
 To: user@hbase.apache.org; lars hofhansl la...@apache.org
 Sent: Sunday, October 6, 2013 11:05 PM
 Subject: Re: HBase Random Read latency  100ms


 Lars,

 In one of your old posts, you had mentioned that lowering the
   BLOCKSIZE
is
 good for random reads (of course with increased size for Block
   Indexes).

 Post is at
http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow

 Will that help in my tests? Should I give it a try? If I alter my
   table,
 should I trigger a major compaction again for this to take
 effect?

 Thanks,
 Ramu



 On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com
   wrote:

  Sorry BLOCKSIZE was wrong in my earlier post, it is the default
  64
   KB.
 
  {NAME = 'usertable', FAMILIES = [{NAME = 'cf',
   DATA_BLOCK_ENCODING
=
  'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0',
   VERSIONS =
 '1',
  COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL =
 '2147483647',
  KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY
 =
 'false',
  ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}
 
  Thanks,
  Ramu
 
 
  On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com
 
wrote

Re: HBase Random Read latency 100ms

2013-10-07 Thread Ramu M S
Hi Bharath,

I am little confused about the metrics displayed by Cloudera. Even when
there are no oeprations, the gc_time metric is showing 2s constant in the
graph. Is this the CMS gc_time (in that case no JVm pause) or the GC pause.

GC timings reported earlier is the average taken for gc_time metric for all
region servers.

Regards,
Ramu


On Mon, Oct 7, 2013 at 9:10 PM, Ramu M S ramu.ma...@gmail.com wrote:

 Jean,

 Yes. It is 2 drives.

 - Ramu


 On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

 Quick questionon the disk side.

 When you say:
 800 GB SATA (7200 RPM) Disk
 Is it 1x800GB? It's raid 1, so might be 2 drives? What's the
 configuration?

 JM


 2013/10/7 Ramu M S ramu.ma...@gmail.com

  Lars, Bharath,
 
  Compression is disabled for the table. This was not intended from the
  evaluation.
  I forgot to mention that during table creation. I will enable snappy
 and do
  major compaction again.
 
  Please suggest other options to try out and also suggestions for the
  previous questions.
 
  Thanks,
  Ramu
 
 
  On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com wrote:
 
   Bharath,
  
   I was about to report this. Yes indeed there is too much of GC time.
   Just verified the GC time using Cloudera Manager statistics(Every
 minute
   update).
  
   For each Region Server,
- During Read: Graph shows 2s constant.
- During Compaction: Graph starts with 7s and goes as high as 20s
 during
   end.
  
   Few more questions,
   1. For the current evaluation, since the reads are completely random
 and
  I
   don't expect to read same data again can I set the Heap to the
 default 1
  GB
   ?
  
   2. Can I completely turn off BLOCK CACHE for this table?
   http://hbase.apache.org/book/regionserver.arch.html recommends
 that
   for Randm reads.
  
   3. But in the next phase of evaluation, We are interested to use
 HBase as
   In-memory KV DB by having the latest data in RAM (To the tune of
 around
  128
   GB in each RS, we are setting up 50-100 Node Cluster). I am very
 curious
  to
   hear any suggestions in this regard.
  
   Regards,
   Ramu
  
  
   On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada 
   bhara...@cloudera.com wrote:
  
   Hi Ramu,
  
   Thanks for reporting the results back. Just curious if you are
 hitting
  any
   big GC pauses due to block cache churn on such large heap. Do you see
  it ?
  
   - Bharath
  
  
   On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com
 wrote:
  
Lars,
   
After changing the BLOCKSIZE to 16KB, the latency has reduced a
  little.
   Now
the average is around 75ms.
Overall throughput (I am using 40 Clients to fetch records) is
 around
  1K
OPS.
   
After compaction hdfsBlocksLocalityIndex is
 91,88,78,90,99,82,94,97 in
   my 8
RS respectively.
   
Thanks,
Ramu
   
   
On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com
  wrote:
   
 Thanks Lars.

 I have changed the BLOCKSIZE to 16KB and triggered a major
   compaction. I
 will report my results once it is done.

 - Ramu


 On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org
   wrote:

 First of: 128gb heap per RegionServer. Wow.I'd be interested to
  hear
your
 experience with such a large heap for your RS. It's definitely
 big
enough.


 It's interesting hat 100gb do fit into the aggregate cache (of
   8x32gb),
 while 1.8tb do not.
 Looks like ~70% of the read request would need to bring in a
 64kb
   block
 in order to read 724 bytes.

 Should that take 100ms? No. Something's still amiss.

 Smaller blocks might help (you'd need to bring in 4, 8, or maybe
  16k
   to
 read the small row). You would need to issue a major compaction
 for
that to
 take effect.
 Maybe try 16k blocks. If that speeds up your random gets we know
   where
to
 look next... At the disk IO.


 -- Lars



 
  From: Ramu M S ramu.ma...@gmail.com
 To: user@hbase.apache.org; lars hofhansl la...@apache.org
 Sent: Sunday, October 6, 2013 11:05 PM
 Subject: Re: HBase Random Read latency  100ms


 Lars,

 In one of your old posts, you had mentioned that lowering the
   BLOCKSIZE
is
 good for random reads (of course with increased size for Block
   Indexes).

 Post is at
http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow

 Will that help in my tests? Should I give it a try? If I alter
 my
   table,
 should I trigger a major compaction again for this to take
 effect?

 Thanks,
 Ramu



 On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com
   wrote:

  Sorry BLOCKSIZE was wrong in my earlier post, it is the
 default
  64
   KB.
 
  {NAME = 'usertable', FAMILIES = [{NAME = 'cf',
   DATA_BLOCK_ENCODING

RE: HBase Random Read latency 100ms

2013-10-07 Thread Vladimir Rodionov
Ramu, your HBase configuration (128GB of heap) is far from optimal.
Nobody runs HBase with that amount of heap to my best knowledge.
32GB of RAM is the usual upper limit. We run 8-12GB in production.

What else, your IO capacity is VERY low. 2 SATA drives in RAID 1 for mostly 
random reads load?
You should have 8, better 12-16 drives per server. Forget about RAID. You have 
HDFS.

Block cache in your case does not help much , as since your read amplification 
is at least x20 (16KB block and 724 B read) - its just waste
RAM (heap). In your case you do not need LARGE heap and LARGE block cache.

I advise you reconsidering your hardware spec, applying all optimizations 
mentioned already in this thread and lowering your expectations.

With a right hardware you will be able to get 500-1000 truly random reads per 
server.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Ramu M S [ramu.ma...@gmail.com]
Sent: Monday, October 07, 2013 5:23 AM
To: user@hbase.apache.org
Subject: Re: HBase Random Read latency  100ms

Hi Bharath,

I am little confused about the metrics displayed by Cloudera. Even when
there are no oeprations, the gc_time metric is showing 2s constant in the
graph. Is this the CMS gc_time (in that case no JVm pause) or the GC pause.

GC timings reported earlier is the average taken for gc_time metric for all
region servers.

Regards,
Ramu


On Mon, Oct 7, 2013 at 9:10 PM, Ramu M S ramu.ma...@gmail.com wrote:

 Jean,

 Yes. It is 2 drives.

 - Ramu


 On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

 Quick questionon the disk side.

 When you say:
 800 GB SATA (7200 RPM) Disk
 Is it 1x800GB? It's raid 1, so might be 2 drives? What's the
 configuration?

 JM


 2013/10/7 Ramu M S ramu.ma...@gmail.com

  Lars, Bharath,
 
  Compression is disabled for the table. This was not intended from the
  evaluation.
  I forgot to mention that during table creation. I will enable snappy
 and do
  major compaction again.
 
  Please suggest other options to try out and also suggestions for the
  previous questions.
 
  Thanks,
  Ramu
 
 
  On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com wrote:
 
   Bharath,
  
   I was about to report this. Yes indeed there is too much of GC time.
   Just verified the GC time using Cloudera Manager statistics(Every
 minute
   update).
  
   For each Region Server,
- During Read: Graph shows 2s constant.
- During Compaction: Graph starts with 7s and goes as high as 20s
 during
   end.
  
   Few more questions,
   1. For the current evaluation, since the reads are completely random
 and
  I
   don't expect to read same data again can I set the Heap to the
 default 1
  GB
   ?
  
   2. Can I completely turn off BLOCK CACHE for this table?
   http://hbase.apache.org/book/regionserver.arch.html recommends
 that
   for Randm reads.
  
   3. But in the next phase of evaluation, We are interested to use
 HBase as
   In-memory KV DB by having the latest data in RAM (To the tune of
 around
  128
   GB in each RS, we are setting up 50-100 Node Cluster). I am very
 curious
  to
   hear any suggestions in this regard.
  
   Regards,
   Ramu
  
  
   On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada 
   bhara...@cloudera.com wrote:
  
   Hi Ramu,
  
   Thanks for reporting the results back. Just curious if you are
 hitting
  any
   big GC pauses due to block cache churn on such large heap. Do you see
  it ?
  
   - Bharath
  
  
   On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com
 wrote:
  
Lars,
   
After changing the BLOCKSIZE to 16KB, the latency has reduced a
  little.
   Now
the average is around 75ms.
Overall throughput (I am using 40 Clients to fetch records) is
 around
  1K
OPS.
   
After compaction hdfsBlocksLocalityIndex is
 91,88,78,90,99,82,94,97 in
   my 8
RS respectively.
   
Thanks,
Ramu
   
   
On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com
  wrote:
   
 Thanks Lars.

 I have changed the BLOCKSIZE to 16KB and triggered a major
   compaction. I
 will report my results once it is done.

 - Ramu


 On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org
   wrote:

 First of: 128gb heap per RegionServer. Wow.I'd be interested to
  hear
your
 experience with such a large heap for your RS. It's definitely
 big
enough.


 It's interesting hat 100gb do fit into the aggregate cache (of
   8x32gb),
 while 1.8tb do not.
 Looks like ~70% of the read request would need to bring in a
 64kb
   block
 in order to read 724 bytes.

 Should that take 100ms? No. Something's still amiss.

 Smaller blocks might help (you'd need to bring in 4, 8, or maybe
  16k
   to
 read the small row). You would need to issue a major compaction

Re: HBase Random Read latency 100ms

2013-10-07 Thread Ramu M S
Vladimir,

Yes. I am fully aware of the HDD limitation and wrong configurations wrt
RAID.
Unfortunately, the hardware is leased from others for this work and I
wasn't consulted to decide the h/w specification for the tests that I am
doing now. Even the RAID cannot be turned off or set to RAID-0

Production system is according to the Hadoop needs (100 Nodes with 16 Core
CPU, 192 GB RAM, 24 X 600GB SAS Drives, RAID cannot be completely turned
off, so we are creating 1 Virtual Disk containing only 1 Physical Disk and
the VD RAID level set to* *RAID-0). These systems are still not available. If
you have any suggestion on the production setup, I will be glad to hear.

Also, as pointed out earlier, we are planning to use HBase also as an in
memory KV store to access the latest data.
That's why RAM was considered huge in this configuration. But looks like we
would run into more problems than any gains from this.

Keeping that aside, I was trying to get the maximum out of the current
cluster or as you said Is 500-1000 OPS the max I could get out of this
setup?

Regards,
Ramu


On Tue, Oct 8, 2013 at 3:02 AM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:

 Ramu, your HBase configuration (128GB of heap) is far from optimal.
 Nobody runs HBase with that amount of heap to my best knowledge.
 32GB of RAM is the usual upper limit. We run 8-12GB in production.

 What else, your IO capacity is VERY low. 2 SATA drives in RAID 1 for
 mostly random reads load?
 You should have 8, better 12-16 drives per server. Forget about RAID. You
 have HDFS.

 Block cache in your case does not help much , as since your read
 amplification is at least x20 (16KB block and 724 B read) - its just waste
 RAM (heap). In your case you do not need LARGE heap and LARGE block cache.

 I advise you reconsidering your hardware spec, applying all optimizations
 mentioned already in this thread and lowering your expectations.

 With a right hardware you will be able to get 500-1000 truly random reads
 per server.

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Ramu M S [ramu.ma...@gmail.com]
 Sent: Monday, October 07, 2013 5:23 AM
 To: user@hbase.apache.org
 Subject: Re: HBase Random Read latency  100ms

 Hi Bharath,

 I am little confused about the metrics displayed by Cloudera. Even when
 there are no oeprations, the gc_time metric is showing 2s constant in the
 graph. Is this the CMS gc_time (in that case no JVm pause) or the GC pause.

 GC timings reported earlier is the average taken for gc_time metric for all
 region servers.

 Regards,
 Ramu


 On Mon, Oct 7, 2013 at 9:10 PM, Ramu M S ramu.ma...@gmail.com wrote:

  Jean,
 
  Yes. It is 2 drives.
 
  - Ramu
 
 
  On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org wrote:
 
  Quick questionon the disk side.
 
  When you say:
  800 GB SATA (7200 RPM) Disk
  Is it 1x800GB? It's raid 1, so might be 2 drives? What's the
  configuration?
 
  JM
 
 
  2013/10/7 Ramu M S ramu.ma...@gmail.com
 
   Lars, Bharath,
  
   Compression is disabled for the table. This was not intended from the
   evaluation.
   I forgot to mention that during table creation. I will enable snappy
  and do
   major compaction again.
  
   Please suggest other options to try out and also suggestions for the
   previous questions.
  
   Thanks,
   Ramu
  
  
   On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com
 wrote:
  
Bharath,
   
I was about to report this. Yes indeed there is too much of GC time.
Just verified the GC time using Cloudera Manager statistics(Every
  minute
update).
   
For each Region Server,
 - During Read: Graph shows 2s constant.
 - During Compaction: Graph starts with 7s and goes as high as 20s
  during
end.
   
Few more questions,
1. For the current evaluation, since the reads are completely random
  and
   I
don't expect to read same data again can I set the Heap to the
  default 1
   GB
?
   
2. Can I completely turn off BLOCK CACHE for this table?
http://hbase.apache.org/book/regionserver.arch.html recommends
  that
for Randm reads.
   
3. But in the next phase of evaluation, We are interested to use
  HBase as
In-memory KV DB by having the latest data in RAM (To the tune of
  around
   128
GB in each RS, we are setting up 50-100 Node Cluster). I am very
  curious
   to
hear any suggestions in this regard.
   
Regards,
Ramu
   
   
On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada 
bhara...@cloudera.com wrote:
   
Hi Ramu,
   
Thanks for reporting the results back. Just curious if you are
  hitting
   any
big GC pauses due to block cache churn on such large heap. Do you
 see
   it ?
   
- Bharath
   
   
On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com
  wrote:
   
 Lars,

 After changing

RE: HBase Random Read latency 100ms

2013-10-07 Thread Vladimir Rodionov
Ramu,

If your working set of data fits into 192GB you may get additional boost by 
utilizing OS page cache, or wait until
0.98 release which introduces new bucket cache implementation (port of Facebook 
L2 cache). You can try vanilla bucket cache in 0.96 (not released yet
but is due soon). Both caches stores data off-heap, but Facebook version can 
store encoded and compressed data and vanilla bucket cache does not.
There are some options how to utilize efficiently available RAM (at least in 
upcoming HBase releases)
. If your data set does not fit RAM then your only hope is your 24 SAS drives. 
Depending on your RAID settings, disk IO perf, HDFS configuration (I think the 
latest Hadoop is preferable here).

OS page cache is most vulnerable and volatile, it can not be controlled and can 
be easily polluted by either some other processes or by HBase itself (long 
scan).
With Block cache you have more control but the first truly usable *official* 
implementation is going to be a part of 0.98 release.

As far as I understand, your use case would definitely covered by something 
similar to BigTable ScanCache (RowCache) , but there is no such cache in HBase 
yet.
One major advantage of RowCache vs BlockCache (apart from being much more 
efficient in RAM usage) is resilience to Region compactions. Each minor Region 
compaction invalidates partially
Region's data in BlockCache and major compaction invalidates this Region's data 
completely. This is not the case with RowCache (would it be implemented).

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Ramu M S [ramu.ma...@gmail.com]
Sent: Monday, October 07, 2013 5:25 PM
To: user@hbase.apache.org
Subject: Re: HBase Random Read latency  100ms

Vladimir,

Yes. I am fully aware of the HDD limitation and wrong configurations wrt
RAID.
Unfortunately, the hardware is leased from others for this work and I
wasn't consulted to decide the h/w specification for the tests that I am
doing now. Even the RAID cannot be turned off or set to RAID-0

Production system is according to the Hadoop needs (100 Nodes with 16 Core
CPU, 192 GB RAM, 24 X 600GB SAS Drives, RAID cannot be completely turned
off, so we are creating 1 Virtual Disk containing only 1 Physical Disk and
the VD RAID level set to* *RAID-0). These systems are still not available. If
you have any suggestion on the production setup, I will be glad to hear.

Also, as pointed out earlier, we are planning to use HBase also as an in
memory KV store to access the latest data.
That's why RAM was considered huge in this configuration. But looks like we
would run into more problems than any gains from this.

Keeping that aside, I was trying to get the maximum out of the current
cluster or as you said Is 500-1000 OPS the max I could get out of this
setup?

Regards,
Ramu



Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or notificati...@carrieriq.com and delete or destroy any 
copy of this message and its attachments.


Re: HBase Random Read latency 100ms

2013-10-07 Thread Ramu M S
Vladimir,

Thanks for the Insights into Future Caching features. Looks very
interesting.

- Ramu


On Tue, Oct 8, 2013 at 10:45 AM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:

 Ramu,

 If your working set of data fits into 192GB you may get additional boost
 by utilizing OS page cache, or wait until
 0.98 release which introduces new bucket cache implementation (port of
 Facebook L2 cache). You can try vanilla bucket cache in 0.96 (not released
 yet
 but is due soon). Both caches stores data off-heap, but Facebook version
 can store encoded and compressed data and vanilla bucket cache does not.
 There are some options how to utilize efficiently available RAM (at least
 in upcoming HBase releases)
 . If your data set does not fit RAM then your only hope is your 24 SAS
 drives. Depending on your RAID settings, disk IO perf, HDFS configuration
 (I think the latest Hadoop is preferable here).

 OS page cache is most vulnerable and volatile, it can not be controlled
 and can be easily polluted by either some other processes or by HBase
 itself (long scan).
 With Block cache you have more control but the first truly usable
 *official* implementation is going to be a part of 0.98 release.

 As far as I understand, your use case would definitely covered by
 something similar to BigTable ScanCache (RowCache) , but there is no such
 cache in HBase yet.
 One major advantage of RowCache vs BlockCache (apart from being much more
 efficient in RAM usage) is resilience to Region compactions. Each minor
 Region compaction invalidates partially
 Region's data in BlockCache and major compaction invalidates this Region's
 data completely. This is not the case with RowCache (would it be
 implemented).

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Ramu M S [ramu.ma...@gmail.com]
 Sent: Monday, October 07, 2013 5:25 PM
 To: user@hbase.apache.org
 Subject: Re: HBase Random Read latency  100ms

 Vladimir,

 Yes. I am fully aware of the HDD limitation and wrong configurations wrt
 RAID.
 Unfortunately, the hardware is leased from others for this work and I
 wasn't consulted to decide the h/w specification for the tests that I am
 doing now. Even the RAID cannot be turned off or set to RAID-0

 Production system is according to the Hadoop needs (100 Nodes with 16 Core
 CPU, 192 GB RAM, 24 X 600GB SAS Drives, RAID cannot be completely turned
 off, so we are creating 1 Virtual Disk containing only 1 Physical Disk and
 the VD RAID level set to* *RAID-0). These systems are still not available.
 If
 you have any suggestion on the production setup, I will be glad to hear.

 Also, as pointed out earlier, we are planning to use HBase also as an in
 memory KV store to access the latest data.
 That's why RAM was considered huge in this configuration. But looks like we
 would run into more problems than any gains from this.

 Keeping that aside, I was trying to get the maximum out of the current
 cluster or as you said Is 500-1000 OPS the max I could get out of this
 setup?

 Regards,
 Ramu



 Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended to be
 read only by the individual or entity to whom this message is addressed. If
 the reader of this message is not the intended recipient or an agent or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any form,
 is strictly prohibited.  If you have received this message in error, please
 immediately notify the sender and/or notificati...@carrieriq.com and
 delete or destroy any copy of this message and its attachments.



Re: HBase Random Read latency 100ms

2013-10-07 Thread Ramu M S
Hi All,

Average Latency is still around 80ms.
I have done the following,

1. Enabled Snappy Compression
2. Reduce the HFile size to 8 GB

Should I attribute these results to bad Disk Configuration OR anything else
to investigate?

- Ramu


On Tue, Oct 8, 2013 at 10:56 AM, Ramu M S ramu.ma...@gmail.com wrote:

 Vladimir,

 Thanks for the Insights into Future Caching features. Looks very
 interesting.

 - Ramu


 On Tue, Oct 8, 2013 at 10:45 AM, Vladimir Rodionov 
 vrodio...@carrieriq.com wrote:

 Ramu,

 If your working set of data fits into 192GB you may get additional boost
 by utilizing OS page cache, or wait until
 0.98 release which introduces new bucket cache implementation (port of
 Facebook L2 cache). You can try vanilla bucket cache in 0.96 (not released
 yet
 but is due soon). Both caches stores data off-heap, but Facebook version
 can store encoded and compressed data and vanilla bucket cache does not.
 There are some options how to utilize efficiently available RAM (at least
 in upcoming HBase releases)
 . If your data set does not fit RAM then your only hope is your 24 SAS
 drives. Depending on your RAID settings, disk IO perf, HDFS configuration
 (I think the latest Hadoop is preferable here).

 OS page cache is most vulnerable and volatile, it can not be controlled
 and can be easily polluted by either some other processes or by HBase
 itself (long scan).
 With Block cache you have more control but the first truly usable
 *official* implementation is going to be a part of 0.98 release.

 As far as I understand, your use case would definitely covered by
 something similar to BigTable ScanCache (RowCache) , but there is no such
 cache in HBase yet.
 One major advantage of RowCache vs BlockCache (apart from being much more
 efficient in RAM usage) is resilience to Region compactions. Each minor
 Region compaction invalidates partially
 Region's data in BlockCache and major compaction invalidates this
 Region's data completely. This is not the case with RowCache (would it be
 implemented).

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Ramu M S [ramu.ma...@gmail.com]
 Sent: Monday, October 07, 2013 5:25 PM
 To: user@hbase.apache.org
 Subject: Re: HBase Random Read latency  100ms

 Vladimir,

 Yes. I am fully aware of the HDD limitation and wrong configurations wrt
 RAID.
 Unfortunately, the hardware is leased from others for this work and I
 wasn't consulted to decide the h/w specification for the tests that I am
 doing now. Even the RAID cannot be turned off or set to RAID-0

 Production system is according to the Hadoop needs (100 Nodes with 16 Core
 CPU, 192 GB RAM, 24 X 600GB SAS Drives, RAID cannot be completely turned
 off, so we are creating 1 Virtual Disk containing only 1 Physical Disk and
 the VD RAID level set to* *RAID-0). These systems are still not
 available. If
 you have any suggestion on the production setup, I will be glad to hear.

 Also, as pointed out earlier, we are planning to use HBase also as an in
 memory KV store to access the latest data.
 That's why RAM was considered huge in this configuration. But looks like
 we
 would run into more problems than any gains from this.

 Keeping that aside, I was trying to get the maximum out of the current
 cluster or as you said Is 500-1000 OPS the max I could get out of this
 setup?

 Regards,
 Ramu



 Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended to be
 read only by the individual or entity to whom this message is addressed. If
 the reader of this message is not the intended recipient or an agent or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any form,
 is strictly prohibited.  If you have received this message in error, please
 immediately notify the sender and/or notificati...@carrieriq.com and
 delete or destroy any copy of this message and its attachments.





HBase Random Read latency 100ms

2013-10-06 Thread Ramu M S
Hi All,

My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6).

Each Region Server is with the following configuration,
16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk
(Unfortunately configured with RAID 1, can't change this as the Machines
are leased temporarily for a month).

I am running YCSB benchmark tests on HBase and currently inserting around
1.8 Billion records.
(1 Key + 7 Fields of 100 Bytes = 724 Bytes per record)

Currently I am getting a write throughput of around 100K OPS, but random
reads are very very slow, all gets have more than 100ms or more latency.

I have changed the following default configuration,
1. HFile Size: 16GB
2. HDFS Block Size: 512 MB

Total Data size is around 1.8 TB (Excluding the replicas).
My Table is split into 128 Regions (No pre-splitting used, started with 1
and grew to 128 over the insertion time)

Taking some inputs from earlier discussions I have done the following
changes to disable Nagle (In both Client and Server hbase-site.xml,
hdfs-site.xml)

property
  namehbase.ipc.client.tcpnodelay/name
  valuetrue/value
/property

property
  nameipc.server.tcpnodelay/name
  valuetrue/value
/property

Ganglia stats shows large CPU IO wait (30% during reads).

I agree that disk configuration is not ideal for Hadoop cluster, but as
told earlier it can't change for now.
I feel the latency is way beyond any reported results so far.

Any pointers on what can be wrong?

Thanks,
Ramu


Re: HBase Random Read latency 100ms

2013-10-06 Thread lars hofhansl
Have you enabled short circuit reading? See here: 
http://hbase.apache.org/book/perf.hdfs.html

How's your data locality (shown on the RegionServer UI page).


How much memory are you giving your RegionServers?
If you reads are truly random and the data set does not fit into the aggregate 
cache, you'll be dominated by the disk and network.
Each read would need to bring in a 64k (default) HFile block. If short circuit 
reading is not enabled you'll get two or three context switches.

So I would try:
1. Enable short circuit reading
2. Increase the block cache size per RegionServer
3. Decrease the HFile block size
4. Make sure your data is local (if it is not, issue a major compaction).


-- Lars




 From: Ramu M S ramu.ma...@gmail.com
To: user@hbase.apache.org 
Sent: Sunday, October 6, 2013 10:01 PM
Subject: HBase Random Read latency  100ms
 

Hi All,

My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6).

Each Region Server is with the following configuration,
16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk
(Unfortunately configured with RAID 1, can't change this as the Machines
are leased temporarily for a month).

I am running YCSB benchmark tests on HBase and currently inserting around
1.8 Billion records.
(1 Key + 7 Fields of 100 Bytes = 724 Bytes per record)

Currently I am getting a write throughput of around 100K OPS, but random
reads are very very slow, all gets have more than 100ms or more latency.

I have changed the following default configuration,
1. HFile Size: 16GB
2. HDFS Block Size: 512 MB

Total Data size is around 1.8 TB (Excluding the replicas).
My Table is split into 128 Regions (No pre-splitting used, started with 1
and grew to 128 over the insertion time)

Taking some inputs from earlier discussions I have done the following
changes to disable Nagle (In both Client and Server hbase-site.xml,
hdfs-site.xml)

property
  namehbase.ipc.client.tcpnodelay/name
  valuetrue/value
/property

property
  nameipc.server.tcpnodelay/name
  valuetrue/value
/property

Ganglia stats shows large CPU IO wait (30% during reads).

I agree that disk configuration is not ideal for Hadoop cluster, but as
told earlier it can't change for now.
I feel the latency is way beyond any reported results so far.

Any pointers on what can be wrong?

Thanks,
Ramu

Re: HBase Random Read latency 100ms

2013-10-06 Thread Bharath Vissapragada
Adding to what Lars said, you can enable bloom filters on column families
for read performance.


On Mon, Oct 7, 2013 at 10:51 AM, lars hofhansl la...@apache.org wrote:

 Have you enabled short circuit reading? See here:
 http://hbase.apache.org/book/perf.hdfs.html

 How's your data locality (shown on the RegionServer UI page).


 How much memory are you giving your RegionServers?
 If you reads are truly random and the data set does not fit into the
 aggregate cache, you'll be dominated by the disk and network.
 Each read would need to bring in a 64k (default) HFile block. If short
 circuit reading is not enabled you'll get two or three context switches.

 So I would try:
 1. Enable short circuit reading
 2. Increase the block cache size per RegionServer
 3. Decrease the HFile block size
 4. Make sure your data is local (if it is not, issue a major compaction).


 -- Lars



 
  From: Ramu M S ramu.ma...@gmail.com
 To: user@hbase.apache.org
 Sent: Sunday, October 6, 2013 10:01 PM
 Subject: HBase Random Read latency  100ms


 Hi All,

 My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6).

 Each Region Server is with the following configuration,
 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk
 (Unfortunately configured with RAID 1, can't change this as the Machines
 are leased temporarily for a month).

 I am running YCSB benchmark tests on HBase and currently inserting around
 1.8 Billion records.
 (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record)

 Currently I am getting a write throughput of around 100K OPS, but random
 reads are very very slow, all gets have more than 100ms or more latency.

 I have changed the following default configuration,
 1. HFile Size: 16GB
 2. HDFS Block Size: 512 MB

 Total Data size is around 1.8 TB (Excluding the replicas).
 My Table is split into 128 Regions (No pre-splitting used, started with 1
 and grew to 128 over the insertion time)

 Taking some inputs from earlier discussions I have done the following
 changes to disable Nagle (In both Client and Server hbase-site.xml,
 hdfs-site.xml)

 property
   namehbase.ipc.client.tcpnodelay/name
   valuetrue/value
 /property

 property
   nameipc.server.tcpnodelay/name
   valuetrue/value
 /property

 Ganglia stats shows large CPU IO wait (30% during reads).

 I agree that disk configuration is not ideal for Hadoop cluster, but as
 told earlier it can't change for now.
 I feel the latency is way beyond any reported results so far.

 Any pointers on what can be wrong?

 Thanks,
 Ramu




-- 
Bharath Vissapragada
http://www.cloudera.com


Re: HBase Random Read latency 100ms

2013-10-06 Thread Ramu M S
Lars,

- Yes Short Circuit reading is enabled on both HDFS and HBase.
- I had issued Major compaction after table is loaded.
- Region Servers have max heap set as 128 GB. Block Cache Size is 0.25 of
heap (So 32 GB for each Region Server) Do we need even more?
- Decreasing HFile Size (Default is 1GB )? Should I leave it to default?
- Keys are Zipfian distributed (By YCSB)

Bharath,

Bloom Filters are enabled. Here is my table details,
{NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING =
'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1',
COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '16384', IN_MEMORY = 'false',
ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}

When the data size is around 100GB (100 Million records), then the latency
is very good. I am getting a throughput of around 300K OPS.
In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk reads are
around 50-60 MB/s throughout the read cycle.

Thanks,
Ramu


On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl la...@apache.org wrote:

 Have you enabled short circuit reading? See here:
 http://hbase.apache.org/book/perf.hdfs.html

 How's your data locality (shown on the RegionServer UI page).


 How much memory are you giving your RegionServers?
 If you reads are truly random and the data set does not fit into the
 aggregate cache, you'll be dominated by the disk and network.
 Each read would need to bring in a 64k (default) HFile block. If short
 circuit reading is not enabled you'll get two or three context switches.

 So I would try:
 1. Enable short circuit reading
 2. Increase the block cache size per RegionServer
 3. Decrease the HFile block size
 4. Make sure your data is local (if it is not, issue a major compaction).


 -- Lars



 
  From: Ramu M S ramu.ma...@gmail.com
 To: user@hbase.apache.org
 Sent: Sunday, October 6, 2013 10:01 PM
 Subject: HBase Random Read latency  100ms


 Hi All,

 My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6).

 Each Region Server is with the following configuration,
 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk
 (Unfortunately configured with RAID 1, can't change this as the Machines
 are leased temporarily for a month).

 I am running YCSB benchmark tests on HBase and currently inserting around
 1.8 Billion records.
 (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record)

 Currently I am getting a write throughput of around 100K OPS, but random
 reads are very very slow, all gets have more than 100ms or more latency.

 I have changed the following default configuration,
 1. HFile Size: 16GB
 2. HDFS Block Size: 512 MB

 Total Data size is around 1.8 TB (Excluding the replicas).
 My Table is split into 128 Regions (No pre-splitting used, started with 1
 and grew to 128 over the insertion time)

 Taking some inputs from earlier discussions I have done the following
 changes to disable Nagle (In both Client and Server hbase-site.xml,
 hdfs-site.xml)

 property
   namehbase.ipc.client.tcpnodelay/name
   valuetrue/value
 /property

 property
   nameipc.server.tcpnodelay/name
   valuetrue/value
 /property

 Ganglia stats shows large CPU IO wait (30% during reads).

 I agree that disk configuration is not ideal for Hadoop cluster, but as
 told earlier it can't change for now.
 I feel the latency is way beyond any reported results so far.

 Any pointers on what can be wrong?

 Thanks,
 Ramu



Re: HBase Random Read latency 100ms

2013-10-06 Thread Ramu M S
Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB.

{NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING =
'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1',
COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false',
ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}

Thanks,
Ramu


On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com wrote:

 Lars,

 - Yes Short Circuit reading is enabled on both HDFS and HBase.
 - I had issued Major compaction after table is loaded.
 - Region Servers have max heap set as 128 GB. Block Cache Size is 0.25 of
 heap (So 32 GB for each Region Server) Do we need even more?
 - Decreasing HFile Size (Default is 1GB )? Should I leave it to default?
 - Keys are Zipfian distributed (By YCSB)

 Bharath,

 Bloom Filters are enabled. Here is my table details,
 {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING =
 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1',
 COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
 KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '16384', IN_MEMORY = 'false',
 ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]}

 When the data size is around 100GB (100 Million records), then the latency
 is very good. I am getting a throughput of around 300K OPS.
 In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk reads are
 around 50-60 MB/s throughout the read cycle.

 Thanks,
 Ramu


 On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl la...@apache.org wrote:

 Have you enabled short circuit reading? See here:
 http://hbase.apache.org/book/perf.hdfs.html

 How's your data locality (shown on the RegionServer UI page).


 How much memory are you giving your RegionServers?
 If you reads are truly random and the data set does not fit into the
 aggregate cache, you'll be dominated by the disk and network.
 Each read would need to bring in a 64k (default) HFile block. If short
 circuit reading is not enabled you'll get two or three context switches.

 So I would try:
 1. Enable short circuit reading
 2. Increase the block cache size per RegionServer
 3. Decrease the HFile block size
 4. Make sure your data is local (if it is not, issue a major compaction).


 -- Lars



 
  From: Ramu M S ramu.ma...@gmail.com
 To: user@hbase.apache.org
 Sent: Sunday, October 6, 2013 10:01 PM
 Subject: HBase Random Read latency  100ms


 Hi All,

 My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6).

 Each Region Server is with the following configuration,
 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk
 (Unfortunately configured with RAID 1, can't change this as the Machines
 are leased temporarily for a month).

 I am running YCSB benchmark tests on HBase and currently inserting around
 1.8 Billion records.
 (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record)

 Currently I am getting a write throughput of around 100K OPS, but random
 reads are very very slow, all gets have more than 100ms or more latency.

 I have changed the following default configuration,
 1. HFile Size: 16GB
 2. HDFS Block Size: 512 MB

 Total Data size is around 1.8 TB (Excluding the replicas).
 My Table is split into 128 Regions (No pre-splitting used, started with 1
 and grew to 128 over the insertion time)

 Taking some inputs from earlier discussions I have done the following
 changes to disable Nagle (In both Client and Server hbase-site.xml,
 hdfs-site.xml)

 property
   namehbase.ipc.client.tcpnodelay/name
   valuetrue/value
 /property

 property
   nameipc.server.tcpnodelay/name
   valuetrue/value
 /property

 Ganglia stats shows large CPU IO wait (30% during reads).

 I agree that disk configuration is not ideal for Hadoop cluster, but as
 told earlier it can't change for now.
 I feel the latency is way beyond any reported results so far.

 Any pointers on what can be wrong?

 Thanks,
 Ramu