Re: HBase Random Read latency 100ms
How many Parallel GC were you using? Regarding block cache - just to see I understood this right: if your are doing a massive read in HBase it's better to turn off block caching through the Scan attribute? On Thursday, October 10, 2013, Otis Gospodnetic wrote: Hi Ramu, I think I saw mentions of this possibly being a GC issue though now it seems it may be a disk IO issue? 3 things: 1) http://blog.sematext.com/2013/06/24/g1-cms-java-garbage-collector/ - our G1 experience, with HBase specificallytrivute 2) If you can share some of your performance graphs (GC, disk IO, JVM memory pools, HBase specific ones, etc.) people will likely be able to provide better help 3) You can do 2) with SPM (see sig), and actually you can send email to this ML with your graphs directly from SPM. :) Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Wed, Oct 9, 2013 at 3:11 AM, Ramu M S ramu.ma...@gmail.com wrote: Hi All, Sorry. There was some mistake in the tests (Clients were not reduced, forgot to change the parameter before running tests). With 8 Clients and, SCR Enabled : Average Latency is 25 ms, IO Wait % is around 8 SCR Disabled: Average Latency is 10 ms, IO Wait % is around 2 Still, SCR disabled gives better results, which confuse me. Can anyone clarify? Also, I tried setting the parameter (hbase.regionserver.checksum.verify as true) Lars suggested with SCR disabled. Average Latency is around 9.8 ms, a fraction lesser. Thanks Ramu On Wed, Oct 9, 2013 at 3:32 PM, Ramu M S ramu.ma...@gmail.com wrote: Hi All, I just ran only 8 parallel clients, With SCR Enabled : Average Latency is 80 ms, IO Wait % is around 8 With SCR Disabled: Average Latency is 40 ms, IO Wait % is around 2 I always thought SCR enabled, allows a client co-located with the DataNode to read HDFS file blocks directly. This gives a performance boost to distributed clients that are aware of locality. Is my understanding wrong OR it doesn't apply to my scenario? Meanwhile I will try setting the parameter suggested by Lars and post you the results. Thanks, Ramu On Wed, Oct 9, 2013 at 2:29 PM, lars hofhansl la...@apache.org wrote: Good call. Could try to enable hbase.regionserver.checksum.verify, which will cause HBase to do its own checksums rather than relying on HDFS (and which saves 1 IO per block get). I do think you can expect the index blocks to be cached at all times. -- Lars From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Tuesday, October 8, 2013 8:44 PM Subject: RE: HBase Random Read latency 100ms Upd. Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO (data + .crc) in a worst case. I think if Bloom Filter is enabled than it is going to be 6 File IO in a worst case (large data set), therefore you will have not 5 IO requests in queue but up to 20-30 IO requests in a queue This definitely explains 100ms avg latency. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Vladimir Rodionov Sent: Tuesday, October 08, 2013 7:24 PM To: user@hbase.apache.org Subject: RE: HBase Random Read latency 100ms Ramu, You have 8 server boxes and 10 client. You have 40 requests in parallel - 5 per RS/DN? You have 5 requests on random reads in a IO queue of your single RAID1. With avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add some overhead of HDFS + HBase and you will probably have your issue explained ? Your bottleneck is your disk system, I think. When you serve most of
Re: HBase Random Read latency 100ms
Hi All, I just ran only 8 parallel clients, With SCR Enabled : Average Latency is 80 ms, IO Wait % is around 8 With SCR Disabled: Average Latency is 40 ms, IO Wait % is around 2 I always thought SCR enabled, allows a client co-located with the DataNode to read HDFS file blocks directly. This gives a performance boost to distributed clients that are aware of locality. Is my understanding wrong OR it doesn't apply to my scenario? Meanwhile I will try setting the parameter suggested by Lars and post you the results. Thanks, Ramu On Wed, Oct 9, 2013 at 2:29 PM, lars hofhansl la...@apache.org wrote: Good call. Could try to enable hbase.regionserver.checksum.verify, which will cause HBase to do its own checksums rather than relying on HDFS (and which saves 1 IO per block get). I do think you can expect the index blocks to be cached at all times. -- Lars From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Tuesday, October 8, 2013 8:44 PM Subject: RE: HBase Random Read latency 100ms Upd. Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO (data + .crc) in a worst case. I think if Bloom Filter is enabled than it is going to be 6 File IO in a worst case (large data set), therefore you will have not 5 IO requests in queue but up to 20-30 IO requests in a queue This definitely explains 100ms avg latency. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Vladimir Rodionov Sent: Tuesday, October 08, 2013 7:24 PM To: user@hbase.apache.org Subject: RE: HBase Random Read latency 100ms Ramu, You have 8 server boxes and 10 client. You have 40 requests in parallel - 5 per RS/DN? You have 5 requests on random reads in a IO queue of your single RAID1. With avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add some overhead of HDFS + HBase and you will probably have your issue explained ? Your bottleneck is your disk system, I think. When you serve most of requests from disks as in your large data set scenario, make sure you have adequate disk sub-system and that it is configured properly. Block Cache and OS page can not help you in this case as working data set is larger than both caches. Good performance numbers in small data set scenario are explained by the fact that data fits into OS page cache and Block Cache - you do not read data from disk even if you disable block cache. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Tuesday, October 08, 2013 6:00 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi All, After few suggestions from the mails earlier I changed the following, 1. Heap Size to 16 GB 2. Block Size to 16KB 3. HFile size to 8 GB (Table now has 256 regions, 32 per server) 4. Data Locality Index is 100 in all RS I have clients running in 10 machines, each with 4 threads. So total 40. This is same in all tests. Result: 1. Average latency is still 100ms. 2. Heap occupancy is around 2-2.5 GB in all RS Few more tests carried out yesterday, TEST 1: Small data set (100 Million records, each with 724 bytes). === Configurations: 1. Heap Size to 1 GB 2. Block Size to 16KB 3. HFile size to 1 GB (Table now has 128 regions, 16 per server) 4. Data Locality Index is 100 in all RS I disabled Block Cache on the table, to make sure I read everything from disk, most of the time. Result: 1. Average Latency is 8ms and throughput went up to 6K/Sec per RS. 2. With Block Cache enabled again, I got average latency around 2ms and throughput of 10K/Sec per RS. Heap occupancy around 650 MB 3. Increased the Heap to 16GB, with Block Cache still enabled, I got average latency around 1 ms and throughput 20K/Sec per RS Heap Occupancy around 2-2.5 GB in all RS TEST 2: Large Data set (1.8 Billion records, each with 724 bytes) == Configurations: 1. Heap Size to 1 GB 2. Block Size to 16KB 3. HFile size to 1 GB (Table now has 2048 regions, 256 per server) 4. Data Locality Index is 100 in all RS Result: 1. Average Latency is 500ms to start with and gradually decreases, but even after around 100 Million reads it is still 100 ms 2. Block Cache = TRUE/FALSE does not make any difference here. Even Heap Size (1GB / 16GB) does not make any difference. 3. Heap occupancy is around 2-2.5 GB under 16GB Heap and around 650 MB under 1GB Heap. GC Time in all of the scenarios is around 2ms/Second, as shown in the Cloudera Manager. Reading most of the items from Disk
Re: HBase Random Read latency 100ms
Hi All, Sorry. There was some mistake in the tests (Clients were not reduced, forgot to change the parameter before running tests). With 8 Clients and, SCR Enabled : Average Latency is 25 ms, IO Wait % is around 8 SCR Disabled: Average Latency is 10 ms, IO Wait % is around 2 Still, SCR disabled gives better results, which confuse me. Can anyone clarify? Also, I tried setting the parameter (hbase.regionserver.checksum.verify as true) Lars suggested with SCR disabled. Average Latency is around 9.8 ms, a fraction lesser. Thanks Ramu On Wed, Oct 9, 2013 at 3:32 PM, Ramu M S ramu.ma...@gmail.com wrote: Hi All, I just ran only 8 parallel clients, With SCR Enabled : Average Latency is 80 ms, IO Wait % is around 8 With SCR Disabled: Average Latency is 40 ms, IO Wait % is around 2 I always thought SCR enabled, allows a client co-located with the DataNode to read HDFS file blocks directly. This gives a performance boost to distributed clients that are aware of locality. Is my understanding wrong OR it doesn't apply to my scenario? Meanwhile I will try setting the parameter suggested by Lars and post you the results. Thanks, Ramu On Wed, Oct 9, 2013 at 2:29 PM, lars hofhansl la...@apache.org wrote: Good call. Could try to enable hbase.regionserver.checksum.verify, which will cause HBase to do its own checksums rather than relying on HDFS (and which saves 1 IO per block get). I do think you can expect the index blocks to be cached at all times. -- Lars From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Tuesday, October 8, 2013 8:44 PM Subject: RE: HBase Random Read latency 100ms Upd. Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO (data + .crc) in a worst case. I think if Bloom Filter is enabled than it is going to be 6 File IO in a worst case (large data set), therefore you will have not 5 IO requests in queue but up to 20-30 IO requests in a queue This definitely explains 100ms avg latency. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Vladimir Rodionov Sent: Tuesday, October 08, 2013 7:24 PM To: user@hbase.apache.org Subject: RE: HBase Random Read latency 100ms Ramu, You have 8 server boxes and 10 client. You have 40 requests in parallel - 5 per RS/DN? You have 5 requests on random reads in a IO queue of your single RAID1. With avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add some overhead of HDFS + HBase and you will probably have your issue explained ? Your bottleneck is your disk system, I think. When you serve most of requests from disks as in your large data set scenario, make sure you have adequate disk sub-system and that it is configured properly. Block Cache and OS page can not help you in this case as working data set is larger than both caches. Good performance numbers in small data set scenario are explained by the fact that data fits into OS page cache and Block Cache - you do not read data from disk even if you disable block cache. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Tuesday, October 08, 2013 6:00 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi All, After few suggestions from the mails earlier I changed the following, 1. Heap Size to 16 GB 2. Block Size to 16KB 3. HFile size to 8 GB (Table now has 256 regions, 32 per server) 4. Data Locality Index is 100 in all RS I have clients running in 10 machines, each with 4 threads. So total 40. This is same in all tests. Result: 1. Average latency is still 100ms. 2. Heap occupancy is around 2-2.5 GB in all RS Few more tests carried out yesterday, TEST 1: Small data set (100 Million records, each with 724 bytes). === Configurations: 1. Heap Size to 1 GB 2. Block Size to 16KB 3. HFile size to 1 GB (Table now has 128 regions, 16 per server) 4. Data Locality Index is 100 in all RS I disabled Block Cache on the table, to make sure I read everything from disk, most of the time. Result: 1. Average Latency is 8ms and throughput went up to 6K/Sec per RS. 2. With Block Cache enabled again, I got average latency around 2ms and throughput of 10K/Sec per RS. Heap occupancy around 650 MB 3. Increased the Heap to 16GB, with Block Cache still enabled, I got average latency around 1 ms and throughput 20K/Sec per RS Heap Occupancy around 2-2.5 GB in all RS TEST 2: Large Data set (1.8 Billion records, each with 724 bytes) == Configurations: 1. Heap Size to 1
RE: HBase Random Read latency 100ms
I can't say for SCR. There is a possibility that the feature is broken, of course. But the fact that hbase.regionserver.checksum.verify does not affect performance means that OS caches effectively HDFS checksum files. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Wednesday, October 09, 2013 12:11 AM To: user@hbase.apache.org; lars hofhansl Subject: Re: HBase Random Read latency 100ms Hi All, Sorry. There was some mistake in the tests (Clients were not reduced, forgot to change the parameter before running tests). With 8 Clients and, SCR Enabled : Average Latency is 25 ms, IO Wait % is around 8 SCR Disabled: Average Latency is 10 ms, IO Wait % is around 2 Still, SCR disabled gives better results, which confuse me. Can anyone clarify? Also, I tried setting the parameter (hbase.regionserver.checksum.verify as true) Lars suggested with SCR disabled. Average Latency is around 9.8 ms, a fraction lesser. Thanks Ramu On Wed, Oct 9, 2013 at 3:32 PM, Ramu M S ramu.ma...@gmail.com wrote: Hi All, I just ran only 8 parallel clients, With SCR Enabled : Average Latency is 80 ms, IO Wait % is around 8 With SCR Disabled: Average Latency is 40 ms, IO Wait % is around 2 I always thought SCR enabled, allows a client co-located with the DataNode to read HDFS file blocks directly. This gives a performance boost to distributed clients that are aware of locality. Is my understanding wrong OR it doesn't apply to my scenario? Meanwhile I will try setting the parameter suggested by Lars and post you the results. Thanks, Ramu On Wed, Oct 9, 2013 at 2:29 PM, lars hofhansl la...@apache.org wrote: Good call. Could try to enable hbase.regionserver.checksum.verify, which will cause HBase to do its own checksums rather than relying on HDFS (and which saves 1 IO per block get). I do think you can expect the index blocks to be cached at all times. -- Lars From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Tuesday, October 8, 2013 8:44 PM Subject: RE: HBase Random Read latency 100ms Upd. Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO (data + .crc) in a worst case. I think if Bloom Filter is enabled than it is going to be 6 File IO in a worst case (large data set), therefore you will have not 5 IO requests in queue but up to 20-30 IO requests in a queue This definitely explains 100ms avg latency. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Vladimir Rodionov Sent: Tuesday, October 08, 2013 7:24 PM To: user@hbase.apache.org Subject: RE: HBase Random Read latency 100ms Ramu, You have 8 server boxes and 10 client. You have 40 requests in parallel - 5 per RS/DN? You have 5 requests on random reads in a IO queue of your single RAID1. With avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add some overhead of HDFS + HBase and you will probably have your issue explained ? Your bottleneck is your disk system, I think. When you serve most of requests from disks as in your large data set scenario, make sure you have adequate disk sub-system and that it is configured properly. Block Cache and OS page can not help you in this case as working data set is larger than both caches. Good performance numbers in small data set scenario are explained by the fact that data fits into OS page cache and Block Cache - you do not read data from disk even if you disable block cache. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Tuesday, October 08, 2013 6:00 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi All, After few suggestions from the mails earlier I changed the following, 1. Heap Size to 16 GB 2. Block Size to 16KB 3. HFile size to 8 GB (Table now has 256 regions, 32 per server) 4. Data Locality Index is 100 in all RS I have clients running in 10 machines, each with 4 threads. So total 40. This is same in all tests. Result: 1. Average latency is still 100ms. 2. Heap occupancy is around 2-2.5 GB in all RS Few more tests carried out yesterday, TEST 1: Small data set (100 Million records, each with 724 bytes). === Configurations: 1. Heap Size to 1 GB 2. Block Size to 16KB 3. HFile size to 1 GB (Table now has 128 regions, 16 per server) 4. Data Locality Index is 100 in all RS I disabled Block Cache on the table, to make sure I read everything from disk, most of the time
Re: HBase Random Read latency 100ms
On Wed, Oct 9, 2013 at 10:59 AM, Vladimir Rodionov vrodio...@carrieriq.comwrote: I can't say for SCR. There is a possibility that the feature is broken, of course. But the fact that hbase.regionserver.checksum.verify does not affect performance means that OS caches effectively HDFS checksum files. See OS cache + SCR VS HBase CRC over OS cache+SCR in this document I shared some time ago: https://docs.google.com/spreadsheet/pub?key=0Ao87IrzZJSaydENaem5USWg4TlRKcHl0dEtTS2NBOUEoutput=html In an all-in-memory test it shows a pretty big difference. J-D Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Wednesday, October 09, 2013 12:11 AM To: user@hbase.apache.org; lars hofhansl Subject: Re: HBase Random Read latency 100ms Hi All, Sorry. There was some mistake in the tests (Clients were not reduced, forgot to change the parameter before running tests). With 8 Clients and, SCR Enabled : Average Latency is 25 ms, IO Wait % is around 8 SCR Disabled: Average Latency is 10 ms, IO Wait % is around 2 Still, SCR disabled gives better results, which confuse me. Can anyone clarify? Also, I tried setting the parameter (hbase.regionserver.checksum.verify as true) Lars suggested with SCR disabled. Average Latency is around 9.8 ms, a fraction lesser. Thanks Ramu On Wed, Oct 9, 2013 at 3:32 PM, Ramu M S ramu.ma...@gmail.com wrote: Hi All, I just ran only 8 parallel clients, With SCR Enabled : Average Latency is 80 ms, IO Wait % is around 8 With SCR Disabled: Average Latency is 40 ms, IO Wait % is around 2 I always thought SCR enabled, allows a client co-located with the DataNode to read HDFS file blocks directly. This gives a performance boost to distributed clients that are aware of locality. Is my understanding wrong OR it doesn't apply to my scenario? Meanwhile I will try setting the parameter suggested by Lars and post you the results. Thanks, Ramu On Wed, Oct 9, 2013 at 2:29 PM, lars hofhansl la...@apache.org wrote: Good call. Could try to enable hbase.regionserver.checksum.verify, which will cause HBase to do its own checksums rather than relying on HDFS (and which saves 1 IO per block get). I do think you can expect the index blocks to be cached at all times. -- Lars From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Tuesday, October 8, 2013 8:44 PM Subject: RE: HBase Random Read latency 100ms Upd. Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO (data + .crc) in a worst case. I think if Bloom Filter is enabled than it is going to be 6 File IO in a worst case (large data set), therefore you will have not 5 IO requests in queue but up to 20-30 IO requests in a queue This definitely explains 100ms avg latency. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Vladimir Rodionov Sent: Tuesday, October 08, 2013 7:24 PM To: user@hbase.apache.org Subject: RE: HBase Random Read latency 100ms Ramu, You have 8 server boxes and 10 client. You have 40 requests in parallel - 5 per RS/DN? You have 5 requests on random reads in a IO queue of your single RAID1. With avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add some overhead of HDFS + HBase and you will probably have your issue explained ? Your bottleneck is your disk system, I think. When you serve most of requests from disks as in your large data set scenario, make sure you have adequate disk sub-system and that it is configured properly. Block Cache and OS page can not help you in this case as working data set is larger than both caches. Good performance numbers in small data set scenario are explained by the fact that data fits into OS page cache and Block Cache - you do not read data from disk even if you disable block cache. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Tuesday, October 08, 2013 6:00 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi All, After few suggestions from the mails earlier I changed the following, 1. Heap Size to 16 GB 2. Block Size to 16KB 3. HFile size to 8 GB (Table now has 256 regions, 32 per server) 4. Data Locality Index is 100 in all RS I have clients running in 10 machines, each with 4 threads. So total 40. This is same in all tests. Result: 1. Average latency is still 100ms
Re: HBase Random Read latency 100ms
Hi Ramu, I think I saw mentions of this possibly being a GC issue though now it seems it may be a disk IO issue? 3 things: 1) http://blog.sematext.com/2013/06/24/g1-cms-java-garbage-collector/ - our G1 experience, with HBase specifically 2) If you can share some of your performance graphs (GC, disk IO, JVM memory pools, HBase specific ones, etc.) people will likely be able to provide better help 3) You can do 2) with SPM (see sig), and actually you can send email to this ML with your graphs directly from SPM. :) Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Wed, Oct 9, 2013 at 3:11 AM, Ramu M S ramu.ma...@gmail.com wrote: Hi All, Sorry. There was some mistake in the tests (Clients were not reduced, forgot to change the parameter before running tests). With 8 Clients and, SCR Enabled : Average Latency is 25 ms, IO Wait % is around 8 SCR Disabled: Average Latency is 10 ms, IO Wait % is around 2 Still, SCR disabled gives better results, which confuse me. Can anyone clarify? Also, I tried setting the parameter (hbase.regionserver.checksum.verify as true) Lars suggested with SCR disabled. Average Latency is around 9.8 ms, a fraction lesser. Thanks Ramu On Wed, Oct 9, 2013 at 3:32 PM, Ramu M S ramu.ma...@gmail.com wrote: Hi All, I just ran only 8 parallel clients, With SCR Enabled : Average Latency is 80 ms, IO Wait % is around 8 With SCR Disabled: Average Latency is 40 ms, IO Wait % is around 2 I always thought SCR enabled, allows a client co-located with the DataNode to read HDFS file blocks directly. This gives a performance boost to distributed clients that are aware of locality. Is my understanding wrong OR it doesn't apply to my scenario? Meanwhile I will try setting the parameter suggested by Lars and post you the results. Thanks, Ramu On Wed, Oct 9, 2013 at 2:29 PM, lars hofhansl la...@apache.org wrote: Good call. Could try to enable hbase.regionserver.checksum.verify, which will cause HBase to do its own checksums rather than relying on HDFS (and which saves 1 IO per block get). I do think you can expect the index blocks to be cached at all times. -- Lars From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Tuesday, October 8, 2013 8:44 PM Subject: RE: HBase Random Read latency 100ms Upd. Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO (data + .crc) in a worst case. I think if Bloom Filter is enabled than it is going to be 6 File IO in a worst case (large data set), therefore you will have not 5 IO requests in queue but up to 20-30 IO requests in a queue This definitely explains 100ms avg latency. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Vladimir Rodionov Sent: Tuesday, October 08, 2013 7:24 PM To: user@hbase.apache.org Subject: RE: HBase Random Read latency 100ms Ramu, You have 8 server boxes and 10 client. You have 40 requests in parallel - 5 per RS/DN? You have 5 requests on random reads in a IO queue of your single RAID1. With avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add some overhead of HDFS + HBase and you will probably have your issue explained ? Your bottleneck is your disk system, I think. When you serve most of requests from disks as in your large data set scenario, make sure you have adequate disk sub-system and that it is configured properly. Block Cache and OS page can not help you in this case as working data set is larger than both caches. Good performance numbers in small data set scenario are explained by the fact that data fits into OS page cache and Block Cache - you do not read data from disk even if you disable block cache. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Tuesday, October 08, 2013 6:00 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi All, After few suggestions from the mails earlier I changed the following, 1. Heap Size to 16 GB 2. Block Size to 16KB 3. HFile size to 8 GB (Table now has 256 regions, 32 per server) 4. Data Locality Index is 100 in all RS I have clients running in 10 machines, each with 4 threads. So total 40. This is same in all tests. Result: 1. Average latency is still 100ms. 2. Heap occupancy is around 2-2.5 GB in all RS Few more tests carried out yesterday, TEST 1: Small data set (100 Million records, each with 724 bytes). === Configurations: 1. Heap Size to 1 GB 2. Block Size to 16KB 3. HFile size to 1 GB (Table now has 128
Re: HBase Random Read latency 100ms
He still should not see 100ms latency. 20ms, sure. 100ms seems large; there are still 8 machines serving the requests. I agree this spec is far from optimal, but there is still something odd here. Ramu, this does not look like a GC issue. You'd see much larger (worst case) latencies if that were the case (dozens of seconds). Are you using 40 client from 40 different machines? Or from 40 different processes on the same machine? Or 40 threads in the same process? Thanks. -- Lars From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Monday, October 7, 2013 11:02 AM Subject: RE: HBase Random Read latency 100ms Ramu, your HBase configuration (128GB of heap) is far from optimal. Nobody runs HBase with that amount of heap to my best knowledge. 32GB of RAM is the usual upper limit. We run 8-12GB in production. What else, your IO capacity is VERY low. 2 SATA drives in RAID 1 for mostly random reads load? You should have 8, better 12-16 drives per server. Forget about RAID. You have HDFS. Block cache in your case does not help much , as since your read amplification is at least x20 (16KB block and 724 B read) - its just waste RAM (heap). In your case you do not need LARGE heap and LARGE block cache. I advise you reconsidering your hardware spec, applying all optimizations mentioned already in this thread and lowering your expectations. With a right hardware you will be able to get 500-1000 truly random reads per server. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Monday, October 07, 2013 5:23 AM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi Bharath, I am little confused about the metrics displayed by Cloudera. Even when there are no oeprations, the gc_time metric is showing 2s constant in the graph. Is this the CMS gc_time (in that case no JVm pause) or the GC pause. GC timings reported earlier is the average taken for gc_time metric for all region servers. Regards, Ramu On Mon, Oct 7, 2013 at 9:10 PM, Ramu M S ramu.ma...@gmail.com wrote: Jean, Yes. It is 2 drives. - Ramu On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Quick questionon the disk side. When you say: 800 GB SATA (7200 RPM) Disk Is it 1x800GB? It's raid 1, so might be 2 drives? What's the configuration? JM 2013/10/7 Ramu M S ramu.ma...@gmail.com Lars, Bharath, Compression is disabled for the table. This was not intended from the evaluation. I forgot to mention that during table creation. I will enable snappy and do major compaction again. Please suggest other options to try out and also suggestions for the previous questions. Thanks, Ramu On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com wrote: Bharath, I was about to report this. Yes indeed there is too much of GC time. Just verified the GC time using Cloudera Manager statistics(Every minute update). For each Region Server, - During Read: Graph shows 2s constant. - During Compaction: Graph starts with 7s and goes as high as 20s during end. Few more questions, 1. For the current evaluation, since the reads are completely random and I don't expect to read same data again can I set the Heap to the default 1 GB ? 2. Can I completely turn off BLOCK CACHE for this table? http://hbase.apache.org/book/regionserver.arch.html recommends that for Randm reads. 3. But in the next phase of evaluation, We are interested to use HBase as In-memory KV DB by having the latest data in RAM (To the tune of around 128 GB in each RS, we are setting up 50-100 Node Cluster). I am very curious to hear any suggestions in this regard. Regards, Ramu On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada bhara...@cloudera.com wrote: Hi Ramu, Thanks for reporting the results back. Just curious if you are hitting any big GC pauses due to block cache churn on such large heap. Do you see it ? - Bharath On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, After changing the BLOCKSIZE to 16KB, the latency has reduced a little. Now the average is around 75ms. Overall throughput (I am using 40 Clients to fetch records) is around 1K OPS. After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97 in my 8 RS respectively. Thanks, Ramu On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com wrote: Thanks Lars. I have changed the BLOCKSIZE to 16KB and triggered a major compaction. I will report my results once it is done. - Ramu
Re: HBase Random Read latency 100ms
How many reads per second per region server are you throwing at the system - also is 100ms the average latency ? On Mon, Oct 7, 2013 at 2:04 PM, lars hofhansl la...@apache.org wrote: He still should not see 100ms latency. 20ms, sure. 100ms seems large; there are still 8 machines serving the requests. I agree this spec is far from optimal, but there is still something odd here. Ramu, this does not look like a GC issue. You'd see much larger (worst case) latencies if that were the case (dozens of seconds). Are you using 40 client from 40 different machines? Or from 40 different processes on the same machine? Or 40 threads in the same process? Thanks. -- Lars From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Monday, October 7, 2013 11:02 AM Subject: RE: HBase Random Read latency 100ms Ramu, your HBase configuration (128GB of heap) is far from optimal. Nobody runs HBase with that amount of heap to my best knowledge. 32GB of RAM is the usual upper limit. We run 8-12GB in production. What else, your IO capacity is VERY low. 2 SATA drives in RAID 1 for mostly random reads load? You should have 8, better 12-16 drives per server. Forget about RAID. You have HDFS. Block cache in your case does not help much , as since your read amplification is at least x20 (16KB block and 724 B read) - its just waste RAM (heap). In your case you do not need LARGE heap and LARGE block cache. I advise you reconsidering your hardware spec, applying all optimizations mentioned already in this thread and lowering your expectations. With a right hardware you will be able to get 500-1000 truly random reads per server. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Monday, October 07, 2013 5:23 AM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi Bharath, I am little confused about the metrics displayed by Cloudera. Even when there are no oeprations, the gc_time metric is showing 2s constant in the graph. Is this the CMS gc_time (in that case no JVm pause) or the GC pause. GC timings reported earlier is the average taken for gc_time metric for all region servers. Regards, Ramu On Mon, Oct 7, 2013 at 9:10 PM, Ramu M S ramu.ma...@gmail.com wrote: Jean, Yes. It is 2 drives. - Ramu On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Quick questionon the disk side. When you say: 800 GB SATA (7200 RPM) Disk Is it 1x800GB? It's raid 1, so might be 2 drives? What's the configuration? JM 2013/10/7 Ramu M S ramu.ma...@gmail.com Lars, Bharath, Compression is disabled for the table. This was not intended from the evaluation. I forgot to mention that during table creation. I will enable snappy and do major compaction again. Please suggest other options to try out and also suggestions for the previous questions. Thanks, Ramu On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com wrote: Bharath, I was about to report this. Yes indeed there is too much of GC time. Just verified the GC time using Cloudera Manager statistics(Every minute update). For each Region Server, - During Read: Graph shows 2s constant. - During Compaction: Graph starts with 7s and goes as high as 20s during end. Few more questions, 1. For the current evaluation, since the reads are completely random and I don't expect to read same data again can I set the Heap to the default 1 GB ? 2. Can I completely turn off BLOCK CACHE for this table? http://hbase.apache.org/book/regionserver.arch.html recommends that for Randm reads. 3. But in the next phase of evaluation, We are interested to use HBase as In-memory KV DB by having the latest data in RAM (To the tune of around 128 GB in each RS, we are setting up 50-100 Node Cluster). I am very curious to hear any suggestions in this regard. Regards, Ramu On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada bhara...@cloudera.com wrote: Hi Ramu, Thanks for reporting the results back. Just curious if you are hitting any big GC pauses due to block cache churn on such large heap. Do you see it ? - Bharath On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, After changing the BLOCKSIZE to 16KB, the latency has reduced a little. Now the average is around 75ms. Overall throughput (I am using 40 Clients to fetch records) is around 1K OPS. After compaction hdfsBlocksLocalityIndex
RE: HBase Random Read latency 100ms
What are your current heap and block cache sizes? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Monday, October 07, 2013 10:55 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi All, Average Latency is still around 80ms. I have done the following, 1. Enabled Snappy Compression 2. Reduce the HFile size to 8 GB Should I attribute these results to bad Disk Configuration OR anything else to investigate? - Ramu On Tue, Oct 8, 2013 at 10:56 AM, Ramu M S ramu.ma...@gmail.com wrote: Vladimir, Thanks for the Insights into Future Caching features. Looks very interesting. - Ramu On Tue, Oct 8, 2013 at 10:45 AM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Ramu, If your working set of data fits into 192GB you may get additional boost by utilizing OS page cache, or wait until 0.98 release which introduces new bucket cache implementation (port of Facebook L2 cache). You can try vanilla bucket cache in 0.96 (not released yet but is due soon). Both caches stores data off-heap, but Facebook version can store encoded and compressed data and vanilla bucket cache does not. There are some options how to utilize efficiently available RAM (at least in upcoming HBase releases) . If your data set does not fit RAM then your only hope is your 24 SAS drives. Depending on your RAID settings, disk IO perf, HDFS configuration (I think the latest Hadoop is preferable here). OS page cache is most vulnerable and volatile, it can not be controlled and can be easily polluted by either some other processes or by HBase itself (long scan). With Block cache you have more control but the first truly usable *official* implementation is going to be a part of 0.98 release. As far as I understand, your use case would definitely covered by something similar to BigTable ScanCache (RowCache) , but there is no such cache in HBase yet. One major advantage of RowCache vs BlockCache (apart from being much more efficient in RAM usage) is resilience to Region compactions. Each minor Region compaction invalidates partially Region's data in BlockCache and major compaction invalidates this Region's data completely. This is not the case with RowCache (would it be implemented). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Monday, October 07, 2013 5:25 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Vladimir, Yes. I am fully aware of the HDD limitation and wrong configurations wrt RAID. Unfortunately, the hardware is leased from others for this work and I wasn't consulted to decide the h/w specification for the tests that I am doing now. Even the RAID cannot be turned off or set to RAID-0 Production system is according to the Hadoop needs (100 Nodes with 16 Core CPU, 192 GB RAM, 24 X 600GB SAS Drives, RAID cannot be completely turned off, so we are creating 1 Virtual Disk containing only 1 Physical Disk and the VD RAID level set to* *RAID-0). These systems are still not available. If you have any suggestion on the production setup, I will be glad to hear. Also, as pointed out earlier, we are planning to use HBase also as an in memory KV store to access the latest data. That's why RAM was considered huge in this configuration. But looks like we would run into more problems than any gains from this. Keeping that aside, I was trying to get the maximum out of the current cluster or as you said Is 500-1000 OPS the max I could get out of this setup? Regards, Ramu Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments. Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message
RE: HBase Random Read latency 100ms
This can be GC related. 128GB heap size, 51.2GB - BlockCache size (on heap) Zipfian distribution of small objects (712B) Results: extreme cache pollution and eviction rate. High eviction - High GC As far as I remember, LruBlockCache does not have real-time eviction and odes it in batches, these batches will add another latency spikes. Decrease heap, reduce block cache size (to minimum) and repeat test. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: lars hofhansl [la...@apache.org] Sent: Monday, October 07, 2013 2:04 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms He still should not see 100ms latency. 20ms, sure. 100ms seems large; there are still 8 machines serving the requests. I agree this spec is far from optimal, but there is still something odd here. Ramu, this does not look like a GC issue. You'd see much larger (worst case) latencies if that were the case (dozens of seconds). Are you using 40 client from 40 different machines? Or from 40 different processes on the same machine? Or 40 threads in the same process? Thanks. -- Lars From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Monday, October 7, 2013 11:02 AM Subject: RE: HBase Random Read latency 100ms Ramu, your HBase configuration (128GB of heap) is far from optimal. Nobody runs HBase with that amount of heap to my best knowledge. 32GB of RAM is the usual upper limit. We run 8-12GB in production. What else, your IO capacity is VERY low. 2 SATA drives in RAID 1 for mostly random reads load? You should have 8, better 12-16 drives per server. Forget about RAID. You have HDFS. Block cache in your case does not help much , as since your read amplification is at least x20 (16KB block and 724 B read) - its just waste RAM (heap). In your case you do not need LARGE heap and LARGE block cache. I advise you reconsidering your hardware spec, applying all optimizations mentioned already in this thread and lowering your expectations. With a right hardware you will be able to get 500-1000 truly random reads per server. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Monday, October 07, 2013 5:23 AM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi Bharath, I am little confused about the metrics displayed by Cloudera. Even when there are no oeprations, the gc_time metric is showing 2s constant in the graph. Is this the CMS gc_time (in that case no JVm pause) or the GC pause. GC timings reported earlier is the average taken for gc_time metric for all region servers. Regards, Ramu On Mon, Oct 7, 2013 at 9:10 PM, Ramu M S ramu.ma...@gmail.com wrote: Jean, Yes. It is 2 drives. - Ramu On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Quick questionon the disk side. When you say: 800 GB SATA (7200 RPM) Disk Is it 1x800GB? It's raid 1, so might be 2 drives? What's the configuration? JM 2013/10/7 Ramu M S ramu.ma...@gmail.com Lars, Bharath, Compression is disabled for the table. This was not intended from the evaluation. I forgot to mention that during table creation. I will enable snappy and do major compaction again. Please suggest other options to try out and also suggestions for the previous questions. Thanks, Ramu On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com wrote: Bharath, I was about to report this. Yes indeed there is too much of GC time. Just verified the GC time using Cloudera Manager statistics(Every minute update). For each Region Server, - During Read: Graph shows 2s constant. - During Compaction: Graph starts with 7s and goes as high as 20s during end. Few more questions, 1. For the current evaluation, since the reads are completely random and I don't expect to read same data again can I set the Heap to the default 1 GB ? 2. Can I completely turn off BLOCK CACHE for this table? http://hbase.apache.org/book/regionserver.arch.html recommends that for Randm reads. 3. But in the next phase of evaluation, We are interested to use HBase as In-memory KV DB by having the latest data in RAM (To the tune of around 128 GB in each RS, we are setting up 50-100 Node Cluster). I am very curious to hear any suggestions in this regard. Regards, Ramu On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada bhara...@cloudera.com wrote: Hi Ramu, Thanks for reporting the results back. Just curious if you are hitting any big GC pauses due to block cache churn on such large heap. Do you see
RE: HBase Random Read latency 100ms
Ramu, You have 8 server boxes and 10 client. You have 40 requests in parallel - 5 per RS/DN? You have 5 requests on random reads in a IO queue of your single RAID1. With avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add some overhead of HDFS + HBase and you will probably have your issue explained ? Your bottleneck is your disk system, I think. When you serve most of requests from disks as in your large data set scenario, make sure you have adequate disk sub-system and that it is configured properly. Block Cache and OS page can not help you in this case as working data set is larger than both caches. Good performance numbers in small data set scenario are explained by the fact that data fits into OS page cache and Block Cache - you do not read data from disk even if you disable block cache. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Tuesday, October 08, 2013 6:00 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi All, After few suggestions from the mails earlier I changed the following, 1. Heap Size to 16 GB 2. Block Size to 16KB 3. HFile size to 8 GB (Table now has 256 regions, 32 per server) 4. Data Locality Index is 100 in all RS I have clients running in 10 machines, each with 4 threads. So total 40. This is same in all tests. Result: 1. Average latency is still 100ms. 2. Heap occupancy is around 2-2.5 GB in all RS Few more tests carried out yesterday, TEST 1: Small data set (100 Million records, each with 724 bytes). === Configurations: 1. Heap Size to 1 GB 2. Block Size to 16KB 3. HFile size to 1 GB (Table now has 128 regions, 16 per server) 4. Data Locality Index is 100 in all RS I disabled Block Cache on the table, to make sure I read everything from disk, most of the time. Result: 1. Average Latency is 8ms and throughput went up to 6K/Sec per RS. 2. With Block Cache enabled again, I got average latency around 2ms and throughput of 10K/Sec per RS. Heap occupancy around 650 MB 3. Increased the Heap to 16GB, with Block Cache still enabled, I got average latency around 1 ms and throughput 20K/Sec per RS Heap Occupancy around 2-2.5 GB in all RS TEST 2: Large Data set (1.8 Billion records, each with 724 bytes) == Configurations: 1. Heap Size to 1 GB 2. Block Size to 16KB 3. HFile size to 1 GB (Table now has 2048 regions, 256 per server) 4. Data Locality Index is 100 in all RS Result: 1. Average Latency is 500ms to start with and gradually decreases, but even after around 100 Million reads it is still 100 ms 2. Block Cache = TRUE/FALSE does not make any difference here. Even Heap Size (1GB / 16GB) does not make any difference. 3. Heap occupancy is around 2-2.5 GB under 16GB Heap and around 650 MB under 1GB Heap. GC Time in all of the scenarios is around 2ms/Second, as shown in the Cloudera Manager. Reading most of the items from Disk in less data scenario gives better results and very low latencies. Number of regions per RS and HFile size does make a huge difference in my Cluster. Keeping 100 Regions per RS as max(Most of the discussions suggest this), should I restrict the HFile size to 1GB? and thus reducing the storage capacity (From 700 GB to 100GB per RS)? Please advice. Thanks, Ramu On Wed, Oct 9, 2013 at 4:58 AM, Vladimir Rodionov vrodio...@carrieriq.comwrote: What are your current heap and block cache sizes? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Monday, October 07, 2013 10:55 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi All, Average Latency is still around 80ms. I have done the following, 1. Enabled Snappy Compression 2. Reduce the HFile size to 8 GB Should I attribute these results to bad Disk Configuration OR anything else to investigate? - Ramu On Tue, Oct 8, 2013 at 10:56 AM, Ramu M S ramu.ma...@gmail.com wrote: Vladimir, Thanks for the Insights into Future Caching features. Looks very interesting. - Ramu On Tue, Oct 8, 2013 at 10:45 AM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Ramu, If your working set of data fits into 192GB you may get additional boost by utilizing OS page cache, or wait until 0.98 release which introduces new bucket cache implementation (port of Facebook L2 cache). You can try vanilla bucket cache in 0.96 (not released yet but is due soon). Both caches stores data off-heap, but Facebook version can store encoded and compressed data and vanilla bucket cache does not. There are some options how to utilize efficiently
RE: HBase Random Read latency 100ms
I suggest you two additional tests on large dataset: Run one client thread per server (8 max) with: 1. SCR enabled 2. SCR disabled Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Vladimir Rodionov Sent: Tuesday, October 08, 2013 7:24 PM To: user@hbase.apache.org Subject: RE: HBase Random Read latency 100ms Ramu, You have 8 server boxes and 10 client. You have 40 requests in parallel - 5 per RS/DN? You have 5 requests on random reads in a IO queue of your single RAID1. With avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add some overhead of HDFS + HBase and you will probably have your issue explained ? Your bottleneck is your disk system, I think. When you serve most of requests from disks as in your large data set scenario, make sure you have adequate disk sub-system and that it is configured properly. Block Cache and OS page can not help you in this case as working data set is larger than both caches. Good performance numbers in small data set scenario are explained by the fact that data fits into OS page cache and Block Cache - you do not read data from disk even if you disable block cache. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Tuesday, October 08, 2013 6:00 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi All, After few suggestions from the mails earlier I changed the following, 1. Heap Size to 16 GB 2. Block Size to 16KB 3. HFile size to 8 GB (Table now has 256 regions, 32 per server) 4. Data Locality Index is 100 in all RS I have clients running in 10 machines, each with 4 threads. So total 40. This is same in all tests. Result: 1. Average latency is still 100ms. 2. Heap occupancy is around 2-2.5 GB in all RS Few more tests carried out yesterday, TEST 1: Small data set (100 Million records, each with 724 bytes). === Configurations: 1. Heap Size to 1 GB 2. Block Size to 16KB 3. HFile size to 1 GB (Table now has 128 regions, 16 per server) 4. Data Locality Index is 100 in all RS I disabled Block Cache on the table, to make sure I read everything from disk, most of the time. Result: 1. Average Latency is 8ms and throughput went up to 6K/Sec per RS. 2. With Block Cache enabled again, I got average latency around 2ms and throughput of 10K/Sec per RS. Heap occupancy around 650 MB 3. Increased the Heap to 16GB, with Block Cache still enabled, I got average latency around 1 ms and throughput 20K/Sec per RS Heap Occupancy around 2-2.5 GB in all RS TEST 2: Large Data set (1.8 Billion records, each with 724 bytes) == Configurations: 1. Heap Size to 1 GB 2. Block Size to 16KB 3. HFile size to 1 GB (Table now has 2048 regions, 256 per server) 4. Data Locality Index is 100 in all RS Result: 1. Average Latency is 500ms to start with and gradually decreases, but even after around 100 Million reads it is still 100 ms 2. Block Cache = TRUE/FALSE does not make any difference here. Even Heap Size (1GB / 16GB) does not make any difference. 3. Heap occupancy is around 2-2.5 GB under 16GB Heap and around 650 MB under 1GB Heap. GC Time in all of the scenarios is around 2ms/Second, as shown in the Cloudera Manager. Reading most of the items from Disk in less data scenario gives better results and very low latencies. Number of regions per RS and HFile size does make a huge difference in my Cluster. Keeping 100 Regions per RS as max(Most of the discussions suggest this), should I restrict the HFile size to 1GB? and thus reducing the storage capacity (From 700 GB to 100GB per RS)? Please advice. Thanks, Ramu On Wed, Oct 9, 2013 at 4:58 AM, Vladimir Rodionov vrodio...@carrieriq.comwrote: What are your current heap and block cache sizes? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Monday, October 07, 2013 10:55 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi All, Average Latency is still around 80ms. I have done the following, 1. Enabled Snappy Compression 2. Reduce the HFile size to 8 GB Should I attribute these results to bad Disk Configuration OR anything else to investigate? - Ramu On Tue, Oct 8, 2013 at 10:56 AM, Ramu M S ramu.ma...@gmail.com wrote: Vladimir, Thanks for the Insights into Future Caching features. Looks very interesting. - Ramu On Tue, Oct 8, 2013 at 10:45 AM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Ramu, If your working set of data fits
Re: HBase Random Read latency 100ms
What is the iowait in both cases ? Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org Sent: Tuesday, October 8, 2013 6:00 PM Subject: Re: HBase Random Read latency 100ms Hi All, After few suggestions from the mails earlier I changed the following, 1. Heap Size to 16 GB 2. Block Size to 16KB 3. HFile size to 8 GB (Table now has 256 regions, 32 per server) 4. Data Locality Index is 100 in all RS I have clients running in 10 machines, each with 4 threads. So total 40. This is same in all tests. Result: 1. Average latency is still 100ms. 2. Heap occupancy is around 2-2.5 GB in all RS Few more tests carried out yesterday, TEST 1: Small data set (100 Million records, each with 724 bytes). === Configurations: 1. Heap Size to 1 GB 2. Block Size to 16KB 3. HFile size to 1 GB (Table now has 128 regions, 16 per server) 4. Data Locality Index is 100 in all RS I disabled Block Cache on the table, to make sure I read everything from disk, most of the time. Result: 1. Average Latency is 8ms and throughput went up to 6K/Sec per RS. 2. With Block Cache enabled again, I got average latency around 2ms and throughput of 10K/Sec per RS. Heap occupancy around 650 MB 3. Increased the Heap to 16GB, with Block Cache still enabled, I got average latency around 1 ms and throughput 20K/Sec per RS Heap Occupancy around 2-2.5 GB in all RS TEST 2: Large Data set (1.8 Billion records, each with 724 bytes) == Configurations: 1. Heap Size to 1 GB 2. Block Size to 16KB 3. HFile size to 1 GB (Table now has 2048 regions, 256 per server) 4. Data Locality Index is 100 in all RS Result: 1. Average Latency is 500ms to start with and gradually decreases, but even after around 100 Million reads it is still 100 ms 2. Block Cache = TRUE/FALSE does not make any difference here. Even Heap Size (1GB / 16GB) does not make any difference. 3. Heap occupancy is around 2-2.5 GB under 16GB Heap and around 650 MB under 1GB Heap. GC Time in all of the scenarios is around 2ms/Second, as shown in the Cloudera Manager. Reading most of the items from Disk in less data scenario gives better results and very low latencies. Number of regions per RS and HFile size does make a huge difference in my Cluster. Keeping 100 Regions per RS as max(Most of the discussions suggest this), should I restrict the HFile size to 1GB? and thus reducing the storage capacity (From 700 GB to 100GB per RS)? Please advice. Thanks, Ramu On Wed, Oct 9, 2013 at 4:58 AM, Vladimir Rodionov vrodio...@carrieriq.comwrote: What are your current heap and block cache sizes? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Monday, October 07, 2013 10:55 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi All, Average Latency is still around 80ms. I have done the following, 1. Enabled Snappy Compression 2. Reduce the HFile size to 8 GB Should I attribute these results to bad Disk Configuration OR anything else to investigate? - Ramu On Tue, Oct 8, 2013 at 10:56 AM, Ramu M S ramu.ma...@gmail.com wrote: Vladimir, Thanks for the Insights into Future Caching features. Looks very interesting. - Ramu On Tue, Oct 8, 2013 at 10:45 AM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Ramu, If your working set of data fits into 192GB you may get additional boost by utilizing OS page cache, or wait until 0.98 release which introduces new bucket cache implementation (port of Facebook L2 cache). You can try vanilla bucket cache in 0.96 (not released yet but is due soon). Both caches stores data off-heap, but Facebook version can store encoded and compressed data and vanilla bucket cache does not. There are some options how to utilize efficiently available RAM (at least in upcoming HBase releases) . If your data set does not fit RAM then your only hope is your 24 SAS drives. Depending on your RAID settings, disk IO perf, HDFS configuration (I think the latest Hadoop is preferable here). OS page cache is most vulnerable and volatile, it can not be controlled and can be easily polluted by either some other processes or by HBase itself (long scan). With Block cache you have more control but the first truly usable *official* implementation is going to be a part of 0.98 release. As far as I understand, your use case would definitely covered by something similar to BigTable ScanCache (RowCache) , but there is no such cache in HBase yet. One major advantage of RowCache vs BlockCache (apart from being much more efficient in RAM usage) is resilience to Region
RE: HBase Random Read latency 100ms
Upd. Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO (data + .crc) in a worst case. I think if Bloom Filter is enabled than it is going to be 6 File IO in a worst case (large data set), therefore you will have not 5 IO requests in queue but up to 20-30 IO requests in a queue This definitely explains 100ms avg latency. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Vladimir Rodionov Sent: Tuesday, October 08, 2013 7:24 PM To: user@hbase.apache.org Subject: RE: HBase Random Read latency 100ms Ramu, You have 8 server boxes and 10 client. You have 40 requests in parallel - 5 per RS/DN? You have 5 requests on random reads in a IO queue of your single RAID1. With avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add some overhead of HDFS + HBase and you will probably have your issue explained ? Your bottleneck is your disk system, I think. When you serve most of requests from disks as in your large data set scenario, make sure you have adequate disk sub-system and that it is configured properly. Block Cache and OS page can not help you in this case as working data set is larger than both caches. Good performance numbers in small data set scenario are explained by the fact that data fits into OS page cache and Block Cache - you do not read data from disk even if you disable block cache. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Tuesday, October 08, 2013 6:00 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi All, After few suggestions from the mails earlier I changed the following, 1. Heap Size to 16 GB 2. Block Size to 16KB 3. HFile size to 8 GB (Table now has 256 regions, 32 per server) 4. Data Locality Index is 100 in all RS I have clients running in 10 machines, each with 4 threads. So total 40. This is same in all tests. Result: 1. Average latency is still 100ms. 2. Heap occupancy is around 2-2.5 GB in all RS Few more tests carried out yesterday, TEST 1: Small data set (100 Million records, each with 724 bytes). === Configurations: 1. Heap Size to 1 GB 2. Block Size to 16KB 3. HFile size to 1 GB (Table now has 128 regions, 16 per server) 4. Data Locality Index is 100 in all RS I disabled Block Cache on the table, to make sure I read everything from disk, most of the time. Result: 1. Average Latency is 8ms and throughput went up to 6K/Sec per RS. 2. With Block Cache enabled again, I got average latency around 2ms and throughput of 10K/Sec per RS. Heap occupancy around 650 MB 3. Increased the Heap to 16GB, with Block Cache still enabled, I got average latency around 1 ms and throughput 20K/Sec per RS Heap Occupancy around 2-2.5 GB in all RS TEST 2: Large Data set (1.8 Billion records, each with 724 bytes) == Configurations: 1. Heap Size to 1 GB 2. Block Size to 16KB 3. HFile size to 1 GB (Table now has 2048 regions, 256 per server) 4. Data Locality Index is 100 in all RS Result: 1. Average Latency is 500ms to start with and gradually decreases, but even after around 100 Million reads it is still 100 ms 2. Block Cache = TRUE/FALSE does not make any difference here. Even Heap Size (1GB / 16GB) does not make any difference. 3. Heap occupancy is around 2-2.5 GB under 16GB Heap and around 650 MB under 1GB Heap. GC Time in all of the scenarios is around 2ms/Second, as shown in the Cloudera Manager. Reading most of the items from Disk in less data scenario gives better results and very low latencies. Number of regions per RS and HFile size does make a huge difference in my Cluster. Keeping 100 Regions per RS as max(Most of the discussions suggest this), should I restrict the HFile size to 1GB? and thus reducing the storage capacity (From 700 GB to 100GB per RS)? Please advice. Thanks, Ramu On Wed, Oct 9, 2013 at 4:58 AM, Vladimir Rodionov vrodio...@carrieriq.comwrote: What are your current heap and block cache sizes? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Monday, October 07, 2013 10:55 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi All, Average Latency is still around 80ms. I have done the following, 1. Enabled Snappy Compression 2. Reduce the HFile size to 8 GB Should I attribute these results to bad Disk Configuration OR anything else to investigate? - Ramu On Tue, Oct 8, 2013 at 10:56 AM, Ramu M S ramu.ma...@gmail.com wrote: Vladimir, Thanks
Re: HBase Random Read latency 100ms
Good call. Could try to enable hbase.regionserver.checksum.verify, which will cause HBase to do its own checksums rather than relying on HDFS (and which saves 1 IO per block get). I do think you can expect the index blocks to be cached at all times. -- Lars From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Tuesday, October 8, 2013 8:44 PM Subject: RE: HBase Random Read latency 100ms Upd. Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO (data + .crc) in a worst case. I think if Bloom Filter is enabled than it is going to be 6 File IO in a worst case (large data set), therefore you will have not 5 IO requests in queue but up to 20-30 IO requests in a queue This definitely explains 100ms avg latency. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Vladimir Rodionov Sent: Tuesday, October 08, 2013 7:24 PM To: user@hbase.apache.org Subject: RE: HBase Random Read latency 100ms Ramu, You have 8 server boxes and 10 client. You have 40 requests in parallel - 5 per RS/DN? You have 5 requests on random reads in a IO queue of your single RAID1. With avg read latency of 10 ms, 5 requests in queue will give us 30ms. Add some overhead of HDFS + HBase and you will probably have your issue explained ? Your bottleneck is your disk system, I think. When you serve most of requests from disks as in your large data set scenario, make sure you have adequate disk sub-system and that it is configured properly. Block Cache and OS page can not help you in this case as working data set is larger than both caches. Good performance numbers in small data set scenario are explained by the fact that data fits into OS page cache and Block Cache - you do not read data from disk even if you disable block cache. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Tuesday, October 08, 2013 6:00 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi All, After few suggestions from the mails earlier I changed the following, 1. Heap Size to 16 GB 2. Block Size to 16KB 3. HFile size to 8 GB (Table now has 256 regions, 32 per server) 4. Data Locality Index is 100 in all RS I have clients running in 10 machines, each with 4 threads. So total 40. This is same in all tests. Result: 1. Average latency is still 100ms. 2. Heap occupancy is around 2-2.5 GB in all RS Few more tests carried out yesterday, TEST 1: Small data set (100 Million records, each with 724 bytes). === Configurations: 1. Heap Size to 1 GB 2. Block Size to 16KB 3. HFile size to 1 GB (Table now has 128 regions, 16 per server) 4. Data Locality Index is 100 in all RS I disabled Block Cache on the table, to make sure I read everything from disk, most of the time. Result: 1. Average Latency is 8ms and throughput went up to 6K/Sec per RS. 2. With Block Cache enabled again, I got average latency around 2ms and throughput of 10K/Sec per RS. Heap occupancy around 650 MB 3. Increased the Heap to 16GB, with Block Cache still enabled, I got average latency around 1 ms and throughput 20K/Sec per RS Heap Occupancy around 2-2.5 GB in all RS TEST 2: Large Data set (1.8 Billion records, each with 724 bytes) == Configurations: 1. Heap Size to 1 GB 2. Block Size to 16KB 3. HFile size to 1 GB (Table now has 2048 regions, 256 per server) 4. Data Locality Index is 100 in all RS Result: 1. Average Latency is 500ms to start with and gradually decreases, but even after around 100 Million reads it is still 100 ms 2. Block Cache = TRUE/FALSE does not make any difference here. Even Heap Size (1GB / 16GB) does not make any difference. 3. Heap occupancy is around 2-2.5 GB under 16GB Heap and around 650 MB under 1GB Heap. GC Time in all of the scenarios is around 2ms/Second, as shown in the Cloudera Manager. Reading most of the items from Disk in less data scenario gives better results and very low latencies. Number of regions per RS and HFile size does make a huge difference in my Cluster. Keeping 100 Regions per RS as max(Most of the discussions suggest this), should I restrict the HFile size to 1GB? and thus reducing the storage capacity (From 700 GB to 100GB per RS)? Please advice. Thanks, Ramu On Wed, Oct 9, 2013 at 4:58 AM, Vladimir Rodionov vrodio...@carrieriq.comwrote: What are your current heap and block cache sizes? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma
Re: HBase Random Read latency 100ms
Lars, In one of your old posts, you had mentioned that lowering the BLOCKSIZE is good for random reads (of course with increased size for Block Indexes). Post is at http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow Will that help in my tests? Should I give it a try? If I alter my table, should I trigger a major compaction again for this to take effect? Thanks, Ramu On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com wrote: Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB. {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} Thanks, Ramu On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, - Yes Short Circuit reading is enabled on both HDFS and HBase. - I had issued Major compaction after table is loaded. - Region Servers have max heap set as 128 GB. Block Cache Size is 0.25 of heap (So 32 GB for each Region Server) Do we need even more? - Decreasing HFile Size (Default is 1GB )? Should I leave it to default? - Keys are Zipfian distributed (By YCSB) Bharath, Bloom Filters are enabled. Here is my table details, {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '16384', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} When the data size is around 100GB (100 Million records), then the latency is very good. I am getting a throughput of around 300K OPS. In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk reads are around 50-60 MB/s throughout the read cycle. Thanks, Ramu On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl la...@apache.org wrote: Have you enabled short circuit reading? See here: http://hbase.apache.org/book/perf.hdfs.html How's your data locality (shown on the RegionServer UI page). How much memory are you giving your RegionServers? If you reads are truly random and the data set does not fit into the aggregate cache, you'll be dominated by the disk and network. Each read would need to bring in a 64k (default) HFile block. If short circuit reading is not enabled you'll get two or three context switches. So I would try: 1. Enable short circuit reading 2. Increase the block cache size per RegionServer 3. Decrease the HFile block size 4. Make sure your data is local (if it is not, issue a major compaction). -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org Sent: Sunday, October 6, 2013 10:01 PM Subject: HBase Random Read latency 100ms Hi All, My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6). Each Region Server is with the following configuration, 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk (Unfortunately configured with RAID 1, can't change this as the Machines are leased temporarily for a month). I am running YCSB benchmark tests on HBase and currently inserting around 1.8 Billion records. (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record) Currently I am getting a write throughput of around 100K OPS, but random reads are very very slow, all gets have more than 100ms or more latency. I have changed the following default configuration, 1. HFile Size: 16GB 2. HDFS Block Size: 512 MB Total Data size is around 1.8 TB (Excluding the replicas). My Table is split into 128 Regions (No pre-splitting used, started with 1 and grew to 128 over the insertion time) Taking some inputs from earlier discussions I have done the following changes to disable Nagle (In both Client and Server hbase-site.xml, hdfs-site.xml) property namehbase.ipc.client.tcpnodelay/name valuetrue/value /property property nameipc.server.tcpnodelay/name valuetrue/value /property Ganglia stats shows large CPU IO wait (30% during reads). I agree that disk configuration is not ideal for Hadoop cluster, but as told earlier it can't change for now. I feel the latency is way beyond any reported results so far. Any pointers on what can be wrong? Thanks, Ramu
Re: HBase Random Read latency 100ms
First of: 128gb heap per RegionServer. Wow.I'd be interested to hear your experience with such a large heap for your RS. It's definitely big enough. It's interesting hat 100gb do fit into the aggregate cache (of 8x32gb), while 1.8tb do not. Looks like ~70% of the read request would need to bring in a 64kb block in order to read 724 bytes. Should that take 100ms? No. Something's still amiss. Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k to read the small row). You would need to issue a major compaction for that to take effect. Maybe try 16k blocks. If that speeds up your random gets we know where to look next... At the disk IO. -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Sunday, October 6, 2013 11:05 PM Subject: Re: HBase Random Read latency 100ms Lars, In one of your old posts, you had mentioned that lowering the BLOCKSIZE is good for random reads (of course with increased size for Block Indexes). Post is at http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow Will that help in my tests? Should I give it a try? If I alter my table, should I trigger a major compaction again for this to take effect? Thanks, Ramu On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com wrote: Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB. {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} Thanks, Ramu On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, - Yes Short Circuit reading is enabled on both HDFS and HBase. - I had issued Major compaction after table is loaded. - Region Servers have max heap set as 128 GB. Block Cache Size is 0.25 of heap (So 32 GB for each Region Server) Do we need even more? - Decreasing HFile Size (Default is 1GB )? Should I leave it to default? - Keys are Zipfian distributed (By YCSB) Bharath, Bloom Filters are enabled. Here is my table details, {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '16384', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} When the data size is around 100GB (100 Million records), then the latency is very good. I am getting a throughput of around 300K OPS. In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk reads are around 50-60 MB/s throughout the read cycle. Thanks, Ramu On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl la...@apache.org wrote: Have you enabled short circuit reading? See here: http://hbase.apache.org/book/perf.hdfs.html How's your data locality (shown on the RegionServer UI page). How much memory are you giving your RegionServers? If you reads are truly random and the data set does not fit into the aggregate cache, you'll be dominated by the disk and network. Each read would need to bring in a 64k (default) HFile block. If short circuit reading is not enabled you'll get two or three context switches. So I would try: 1. Enable short circuit reading 2. Increase the block cache size per RegionServer 3. Decrease the HFile block size 4. Make sure your data is local (if it is not, issue a major compaction). -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org Sent: Sunday, October 6, 2013 10:01 PM Subject: HBase Random Read latency 100ms Hi All, My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6). Each Region Server is with the following configuration, 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk (Unfortunately configured with RAID 1, can't change this as the Machines are leased temporarily for a month). I am running YCSB benchmark tests on HBase and currently inserting around 1.8 Billion records. (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record) Currently I am getting a write throughput of around 100K OPS, but random reads are very very slow, all gets have more than 100ms or more latency. I have changed the following default configuration, 1. HFile Size: 16GB 2. HDFS Block Size: 512 MB Total Data size is around 1.8 TB (Excluding the replicas). My Table is split into 128 Regions (No pre-splitting used, started with 1 and grew to 128 over the insertion time) Taking some inputs from earlier discussions I have done the following changes to disable Nagle (In both Client and Server hbase-site.xml, hdfs-site.xml) property namehbase.ipc.client.tcpnodelay/name valuetrue/value /property
Re: HBase Random Read latency 100ms
Thanks Lars. I have changed the BLOCKSIZE to 16KB and triggered a major compaction. I will report my results once it is done. - Ramu On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org wrote: First of: 128gb heap per RegionServer. Wow.I'd be interested to hear your experience with such a large heap for your RS. It's definitely big enough. It's interesting hat 100gb do fit into the aggregate cache (of 8x32gb), while 1.8tb do not. Looks like ~70% of the read request would need to bring in a 64kb block in order to read 724 bytes. Should that take 100ms? No. Something's still amiss. Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k to read the small row). You would need to issue a major compaction for that to take effect. Maybe try 16k blocks. If that speeds up your random gets we know where to look next... At the disk IO. -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Sunday, October 6, 2013 11:05 PM Subject: Re: HBase Random Read latency 100ms Lars, In one of your old posts, you had mentioned that lowering the BLOCKSIZE is good for random reads (of course with increased size for Block Indexes). Post is at http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow Will that help in my tests? Should I give it a try? If I alter my table, should I trigger a major compaction again for this to take effect? Thanks, Ramu On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com wrote: Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB. {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} Thanks, Ramu On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, - Yes Short Circuit reading is enabled on both HDFS and HBase. - I had issued Major compaction after table is loaded. - Region Servers have max heap set as 128 GB. Block Cache Size is 0.25 of heap (So 32 GB for each Region Server) Do we need even more? - Decreasing HFile Size (Default is 1GB )? Should I leave it to default? - Keys are Zipfian distributed (By YCSB) Bharath, Bloom Filters are enabled. Here is my table details, {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '16384', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} When the data size is around 100GB (100 Million records), then the latency is very good. I am getting a throughput of around 300K OPS. In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk reads are around 50-60 MB/s throughout the read cycle. Thanks, Ramu On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl la...@apache.org wrote: Have you enabled short circuit reading? See here: http://hbase.apache.org/book/perf.hdfs.html How's your data locality (shown on the RegionServer UI page). How much memory are you giving your RegionServers? If you reads are truly random and the data set does not fit into the aggregate cache, you'll be dominated by the disk and network. Each read would need to bring in a 64k (default) HFile block. If short circuit reading is not enabled you'll get two or three context switches. So I would try: 1. Enable short circuit reading 2. Increase the block cache size per RegionServer 3. Decrease the HFile block size 4. Make sure your data is local (if it is not, issue a major compaction). -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org Sent: Sunday, October 6, 2013 10:01 PM Subject: HBase Random Read latency 100ms Hi All, My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6). Each Region Server is with the following configuration, 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk (Unfortunately configured with RAID 1, can't change this as the Machines are leased temporarily for a month). I am running YCSB benchmark tests on HBase and currently inserting around 1.8 Billion records. (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record) Currently I am getting a write throughput of around 100K OPS, but random reads are very very slow, all gets have more than 100ms or more latency. I have changed the following default configuration, 1. HFile Size: 16GB 2. HDFS Block Size: 512 MB Total Data size is around 1.8 TB (Excluding the replicas). My Table is split
Re: HBase Random Read latency 100ms
Lars, After changing the BLOCKSIZE to 16KB, the latency has reduced a little. Now the average is around 75ms. Overall throughput (I am using 40 Clients to fetch records) is around 1K OPS. After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97 in my 8 RS respectively. Thanks, Ramu On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com wrote: Thanks Lars. I have changed the BLOCKSIZE to 16KB and triggered a major compaction. I will report my results once it is done. - Ramu On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org wrote: First of: 128gb heap per RegionServer. Wow.I'd be interested to hear your experience with such a large heap for your RS. It's definitely big enough. It's interesting hat 100gb do fit into the aggregate cache (of 8x32gb), while 1.8tb do not. Looks like ~70% of the read request would need to bring in a 64kb block in order to read 724 bytes. Should that take 100ms? No. Something's still amiss. Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k to read the small row). You would need to issue a major compaction for that to take effect. Maybe try 16k blocks. If that speeds up your random gets we know where to look next... At the disk IO. -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Sunday, October 6, 2013 11:05 PM Subject: Re: HBase Random Read latency 100ms Lars, In one of your old posts, you had mentioned that lowering the BLOCKSIZE is good for random reads (of course with increased size for Block Indexes). Post is at http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow Will that help in my tests? Should I give it a try? If I alter my table, should I trigger a major compaction again for this to take effect? Thanks, Ramu On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com wrote: Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB. {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} Thanks, Ramu On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, - Yes Short Circuit reading is enabled on both HDFS and HBase. - I had issued Major compaction after table is loaded. - Region Servers have max heap set as 128 GB. Block Cache Size is 0.25 of heap (So 32 GB for each Region Server) Do we need even more? - Decreasing HFile Size (Default is 1GB )? Should I leave it to default? - Keys are Zipfian distributed (By YCSB) Bharath, Bloom Filters are enabled. Here is my table details, {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '16384', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} When the data size is around 100GB (100 Million records), then the latency is very good. I am getting a throughput of around 300K OPS. In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk reads are around 50-60 MB/s throughout the read cycle. Thanks, Ramu On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl la...@apache.org wrote: Have you enabled short circuit reading? See here: http://hbase.apache.org/book/perf.hdfs.html How's your data locality (shown on the RegionServer UI page). How much memory are you giving your RegionServers? If you reads are truly random and the data set does not fit into the aggregate cache, you'll be dominated by the disk and network. Each read would need to bring in a 64k (default) HFile block. If short circuit reading is not enabled you'll get two or three context switches. So I would try: 1. Enable short circuit reading 2. Increase the block cache size per RegionServer 3. Decrease the HFile block size 4. Make sure your data is local (if it is not, issue a major compaction). -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org Sent: Sunday, October 6, 2013 10:01 PM Subject: HBase Random Read latency 100ms Hi All, My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6). Each Region Server is with the following configuration, 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk (Unfortunately configured with RAID 1, can't change this as the Machines are leased temporarily for a month). I am running YCSB benchmark tests on HBase and currently inserting around 1.8 Billion records. (1 Key + 7 Fields of 100
Re: HBase Random Read latency 100ms
Hi Ramu, Thanks for reporting the results back. Just curious if you are hitting any big GC pauses due to block cache churn on such large heap. Do you see it ? - Bharath On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, After changing the BLOCKSIZE to 16KB, the latency has reduced a little. Now the average is around 75ms. Overall throughput (I am using 40 Clients to fetch records) is around 1K OPS. After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97 in my 8 RS respectively. Thanks, Ramu On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com wrote: Thanks Lars. I have changed the BLOCKSIZE to 16KB and triggered a major compaction. I will report my results once it is done. - Ramu On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org wrote: First of: 128gb heap per RegionServer. Wow.I'd be interested to hear your experience with such a large heap for your RS. It's definitely big enough. It's interesting hat 100gb do fit into the aggregate cache (of 8x32gb), while 1.8tb do not. Looks like ~70% of the read request would need to bring in a 64kb block in order to read 724 bytes. Should that take 100ms? No. Something's still amiss. Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k to read the small row). You would need to issue a major compaction for that to take effect. Maybe try 16k blocks. If that speeds up your random gets we know where to look next... At the disk IO. -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Sunday, October 6, 2013 11:05 PM Subject: Re: HBase Random Read latency 100ms Lars, In one of your old posts, you had mentioned that lowering the BLOCKSIZE is good for random reads (of course with increased size for Block Indexes). Post is at http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow Will that help in my tests? Should I give it a try? If I alter my table, should I trigger a major compaction again for this to take effect? Thanks, Ramu On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com wrote: Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB. {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} Thanks, Ramu On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, - Yes Short Circuit reading is enabled on both HDFS and HBase. - I had issued Major compaction after table is loaded. - Region Servers have max heap set as 128 GB. Block Cache Size is 0.25 of heap (So 32 GB for each Region Server) Do we need even more? - Decreasing HFile Size (Default is 1GB )? Should I leave it to default? - Keys are Zipfian distributed (By YCSB) Bharath, Bloom Filters are enabled. Here is my table details, {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '16384', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} When the data size is around 100GB (100 Million records), then the latency is very good. I am getting a throughput of around 300K OPS. In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk reads are around 50-60 MB/s throughout the read cycle. Thanks, Ramu On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl la...@apache.org wrote: Have you enabled short circuit reading? See here: http://hbase.apache.org/book/perf.hdfs.html How's your data locality (shown on the RegionServer UI page). How much memory are you giving your RegionServers? If you reads are truly random and the data set does not fit into the aggregate cache, you'll be dominated by the disk and network. Each read would need to bring in a 64k (default) HFile block. If short circuit reading is not enabled you'll get two or three context switches. So I would try: 1. Enable short circuit reading 2. Increase the block cache size per RegionServer 3. Decrease the HFile block size 4. Make sure your data is local (if it is not, issue a major compaction). -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org Sent: Sunday, October 6, 2013 10:01 PM Subject: HBase Random Read latency 100ms Hi All, My HBase cluster has
Re: HBase Random Read latency 100ms
Bharath, I was about to report this. Yes indeed there is too much of GC time. Just verified the GC time using Cloudera Manager statistics(Every minute update). For each Region Server, - During Read: Graph shows 2s constant. - During Compaction: Graph starts with 7s and goes as high as 20s during end. Few more questions, 1. For the current evaluation, since the reads are completely random and I don't expect to read same data again can I set the Heap to the default 1 GB ? 2. Can I completely turn off BLOCK CACHE for this table? http://hbase.apache.org/book/regionserver.arch.html recommends that for Randm reads. 3. But in the next phase of evaluation, We are interested to use HBase as In-memory KV DB by having the latest data in RAM (To the tune of around 128 GB in each RS, we are setting up 50-100 Node Cluster). I am very curious to hear any suggestions in this regard. Regards, Ramu On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada bhara...@cloudera.comwrote: Hi Ramu, Thanks for reporting the results back. Just curious if you are hitting any big GC pauses due to block cache churn on such large heap. Do you see it ? - Bharath On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, After changing the BLOCKSIZE to 16KB, the latency has reduced a little. Now the average is around 75ms. Overall throughput (I am using 40 Clients to fetch records) is around 1K OPS. After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97 in my 8 RS respectively. Thanks, Ramu On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com wrote: Thanks Lars. I have changed the BLOCKSIZE to 16KB and triggered a major compaction. I will report my results once it is done. - Ramu On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org wrote: First of: 128gb heap per RegionServer. Wow.I'd be interested to hear your experience with such a large heap for your RS. It's definitely big enough. It's interesting hat 100gb do fit into the aggregate cache (of 8x32gb), while 1.8tb do not. Looks like ~70% of the read request would need to bring in a 64kb block in order to read 724 bytes. Should that take 100ms? No. Something's still amiss. Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k to read the small row). You would need to issue a major compaction for that to take effect. Maybe try 16k blocks. If that speeds up your random gets we know where to look next... At the disk IO. -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Sunday, October 6, 2013 11:05 PM Subject: Re: HBase Random Read latency 100ms Lars, In one of your old posts, you had mentioned that lowering the BLOCKSIZE is good for random reads (of course with increased size for Block Indexes). Post is at http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow Will that help in my tests? Should I give it a try? If I alter my table, should I trigger a major compaction again for this to take effect? Thanks, Ramu On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com wrote: Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB. {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} Thanks, Ramu On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, - Yes Short Circuit reading is enabled on both HDFS and HBase. - I had issued Major compaction after table is loaded. - Region Servers have max heap set as 128 GB. Block Cache Size is 0.25 of heap (So 32 GB for each Region Server) Do we need even more? - Decreasing HFile Size (Default is 1GB )? Should I leave it to default? - Keys are Zipfian distributed (By YCSB) Bharath, Bloom Filters are enabled. Here is my table details, {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '16384', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} When the data size is around 100GB (100 Million records), then the latency is very good. I am getting a throughput of around 300K OPS. In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk reads are around 50-60 MB/s
Re: HBase Random Read latency 100ms
Lars, Bharath, Compression is disabled for the table. This was not intended from the evaluation. I forgot to mention that during table creation. I will enable snappy and do major compaction again. Please suggest other options to try out and also suggestions for the previous questions. Thanks, Ramu On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com wrote: Bharath, I was about to report this. Yes indeed there is too much of GC time. Just verified the GC time using Cloudera Manager statistics(Every minute update). For each Region Server, - During Read: Graph shows 2s constant. - During Compaction: Graph starts with 7s and goes as high as 20s during end. Few more questions, 1. For the current evaluation, since the reads are completely random and I don't expect to read same data again can I set the Heap to the default 1 GB ? 2. Can I completely turn off BLOCK CACHE for this table? http://hbase.apache.org/book/regionserver.arch.html recommends that for Randm reads. 3. But in the next phase of evaluation, We are interested to use HBase as In-memory KV DB by having the latest data in RAM (To the tune of around 128 GB in each RS, we are setting up 50-100 Node Cluster). I am very curious to hear any suggestions in this regard. Regards, Ramu On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada bhara...@cloudera.com wrote: Hi Ramu, Thanks for reporting the results back. Just curious if you are hitting any big GC pauses due to block cache churn on such large heap. Do you see it ? - Bharath On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, After changing the BLOCKSIZE to 16KB, the latency has reduced a little. Now the average is around 75ms. Overall throughput (I am using 40 Clients to fetch records) is around 1K OPS. After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97 in my 8 RS respectively. Thanks, Ramu On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com wrote: Thanks Lars. I have changed the BLOCKSIZE to 16KB and triggered a major compaction. I will report my results once it is done. - Ramu On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org wrote: First of: 128gb heap per RegionServer. Wow.I'd be interested to hear your experience with such a large heap for your RS. It's definitely big enough. It's interesting hat 100gb do fit into the aggregate cache (of 8x32gb), while 1.8tb do not. Looks like ~70% of the read request would need to bring in a 64kb block in order to read 724 bytes. Should that take 100ms? No. Something's still amiss. Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k to read the small row). You would need to issue a major compaction for that to take effect. Maybe try 16k blocks. If that speeds up your random gets we know where to look next... At the disk IO. -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Sunday, October 6, 2013 11:05 PM Subject: Re: HBase Random Read latency 100ms Lars, In one of your old posts, you had mentioned that lowering the BLOCKSIZE is good for random reads (of course with increased size for Block Indexes). Post is at http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow Will that help in my tests? Should I give it a try? If I alter my table, should I trigger a major compaction again for this to take effect? Thanks, Ramu On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com wrote: Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB. {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} Thanks, Ramu On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, - Yes Short Circuit reading is enabled on both HDFS and HBase. - I had issued Major compaction after table is loaded. - Region Servers have max heap set as 128 GB. Block Cache Size is 0.25 of heap (So 32 GB for each Region Server) Do we need even more? - Decreasing HFile Size (Default is 1GB )? Should I leave it to default? - Keys are Zipfian distributed (By YCSB) Bharath, Bloom Filters are enabled. Here is my table details, {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0
Re: HBase Random Read latency 100ms
Quick questionon the disk side. When you say: 800 GB SATA (7200 RPM) Disk Is it 1x800GB? It's raid 1, so might be 2 drives? What's the configuration? JM 2013/10/7 Ramu M S ramu.ma...@gmail.com Lars, Bharath, Compression is disabled for the table. This was not intended from the evaluation. I forgot to mention that during table creation. I will enable snappy and do major compaction again. Please suggest other options to try out and also suggestions for the previous questions. Thanks, Ramu On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com wrote: Bharath, I was about to report this. Yes indeed there is too much of GC time. Just verified the GC time using Cloudera Manager statistics(Every minute update). For each Region Server, - During Read: Graph shows 2s constant. - During Compaction: Graph starts with 7s and goes as high as 20s during end. Few more questions, 1. For the current evaluation, since the reads are completely random and I don't expect to read same data again can I set the Heap to the default 1 GB ? 2. Can I completely turn off BLOCK CACHE for this table? http://hbase.apache.org/book/regionserver.arch.html recommends that for Randm reads. 3. But in the next phase of evaluation, We are interested to use HBase as In-memory KV DB by having the latest data in RAM (To the tune of around 128 GB in each RS, we are setting up 50-100 Node Cluster). I am very curious to hear any suggestions in this regard. Regards, Ramu On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada bhara...@cloudera.com wrote: Hi Ramu, Thanks for reporting the results back. Just curious if you are hitting any big GC pauses due to block cache churn on such large heap. Do you see it ? - Bharath On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, After changing the BLOCKSIZE to 16KB, the latency has reduced a little. Now the average is around 75ms. Overall throughput (I am using 40 Clients to fetch records) is around 1K OPS. After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97 in my 8 RS respectively. Thanks, Ramu On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com wrote: Thanks Lars. I have changed the BLOCKSIZE to 16KB and triggered a major compaction. I will report my results once it is done. - Ramu On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org wrote: First of: 128gb heap per RegionServer. Wow.I'd be interested to hear your experience with such a large heap for your RS. It's definitely big enough. It's interesting hat 100gb do fit into the aggregate cache (of 8x32gb), while 1.8tb do not. Looks like ~70% of the read request would need to bring in a 64kb block in order to read 724 bytes. Should that take 100ms? No. Something's still amiss. Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k to read the small row). You would need to issue a major compaction for that to take effect. Maybe try 16k blocks. If that speeds up your random gets we know where to look next... At the disk IO. -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Sunday, October 6, 2013 11:05 PM Subject: Re: HBase Random Read latency 100ms Lars, In one of your old posts, you had mentioned that lowering the BLOCKSIZE is good for random reads (of course with increased size for Block Indexes). Post is at http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow Will that help in my tests? Should I give it a try? If I alter my table, should I trigger a major compaction again for this to take effect? Thanks, Ramu On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com wrote: Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB. {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} Thanks, Ramu On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, - Yes Short Circuit reading is enabled on both HDFS and HBase. - I had issued Major compaction after table is loaded. - Region Servers have max heap set as 128 GB. Block Cache Size is 0.25 of heap (So 32 GB for each Region Server) Do we need even more? - Decreasing HFile
Re: HBase Random Read latency 100ms
Jean, Yes. It is 2 drives. - Ramu On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Quick questionon the disk side. When you say: 800 GB SATA (7200 RPM) Disk Is it 1x800GB? It's raid 1, so might be 2 drives? What's the configuration? JM 2013/10/7 Ramu M S ramu.ma...@gmail.com Lars, Bharath, Compression is disabled for the table. This was not intended from the evaluation. I forgot to mention that during table creation. I will enable snappy and do major compaction again. Please suggest other options to try out and also suggestions for the previous questions. Thanks, Ramu On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com wrote: Bharath, I was about to report this. Yes indeed there is too much of GC time. Just verified the GC time using Cloudera Manager statistics(Every minute update). For each Region Server, - During Read: Graph shows 2s constant. - During Compaction: Graph starts with 7s and goes as high as 20s during end. Few more questions, 1. For the current evaluation, since the reads are completely random and I don't expect to read same data again can I set the Heap to the default 1 GB ? 2. Can I completely turn off BLOCK CACHE for this table? http://hbase.apache.org/book/regionserver.arch.html recommends that for Randm reads. 3. But in the next phase of evaluation, We are interested to use HBase as In-memory KV DB by having the latest data in RAM (To the tune of around 128 GB in each RS, we are setting up 50-100 Node Cluster). I am very curious to hear any suggestions in this regard. Regards, Ramu On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada bhara...@cloudera.com wrote: Hi Ramu, Thanks for reporting the results back. Just curious if you are hitting any big GC pauses due to block cache churn on such large heap. Do you see it ? - Bharath On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, After changing the BLOCKSIZE to 16KB, the latency has reduced a little. Now the average is around 75ms. Overall throughput (I am using 40 Clients to fetch records) is around 1K OPS. After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97 in my 8 RS respectively. Thanks, Ramu On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com wrote: Thanks Lars. I have changed the BLOCKSIZE to 16KB and triggered a major compaction. I will report my results once it is done. - Ramu On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org wrote: First of: 128gb heap per RegionServer. Wow.I'd be interested to hear your experience with such a large heap for your RS. It's definitely big enough. It's interesting hat 100gb do fit into the aggregate cache (of 8x32gb), while 1.8tb do not. Looks like ~70% of the read request would need to bring in a 64kb block in order to read 724 bytes. Should that take 100ms? No. Something's still amiss. Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k to read the small row). You would need to issue a major compaction for that to take effect. Maybe try 16k blocks. If that speeds up your random gets we know where to look next... At the disk IO. -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Sunday, October 6, 2013 11:05 PM Subject: Re: HBase Random Read latency 100ms Lars, In one of your old posts, you had mentioned that lowering the BLOCKSIZE is good for random reads (of course with increased size for Block Indexes). Post is at http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow Will that help in my tests? Should I give it a try? If I alter my table, should I trigger a major compaction again for this to take effect? Thanks, Ramu On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com wrote: Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB. {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} Thanks, Ramu On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com wrote
Re: HBase Random Read latency 100ms
Hi Bharath, I am little confused about the metrics displayed by Cloudera. Even when there are no oeprations, the gc_time metric is showing 2s constant in the graph. Is this the CMS gc_time (in that case no JVm pause) or the GC pause. GC timings reported earlier is the average taken for gc_time metric for all region servers. Regards, Ramu On Mon, Oct 7, 2013 at 9:10 PM, Ramu M S ramu.ma...@gmail.com wrote: Jean, Yes. It is 2 drives. - Ramu On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Quick questionon the disk side. When you say: 800 GB SATA (7200 RPM) Disk Is it 1x800GB? It's raid 1, so might be 2 drives? What's the configuration? JM 2013/10/7 Ramu M S ramu.ma...@gmail.com Lars, Bharath, Compression is disabled for the table. This was not intended from the evaluation. I forgot to mention that during table creation. I will enable snappy and do major compaction again. Please suggest other options to try out and also suggestions for the previous questions. Thanks, Ramu On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com wrote: Bharath, I was about to report this. Yes indeed there is too much of GC time. Just verified the GC time using Cloudera Manager statistics(Every minute update). For each Region Server, - During Read: Graph shows 2s constant. - During Compaction: Graph starts with 7s and goes as high as 20s during end. Few more questions, 1. For the current evaluation, since the reads are completely random and I don't expect to read same data again can I set the Heap to the default 1 GB ? 2. Can I completely turn off BLOCK CACHE for this table? http://hbase.apache.org/book/regionserver.arch.html recommends that for Randm reads. 3. But in the next phase of evaluation, We are interested to use HBase as In-memory KV DB by having the latest data in RAM (To the tune of around 128 GB in each RS, we are setting up 50-100 Node Cluster). I am very curious to hear any suggestions in this regard. Regards, Ramu On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada bhara...@cloudera.com wrote: Hi Ramu, Thanks for reporting the results back. Just curious if you are hitting any big GC pauses due to block cache churn on such large heap. Do you see it ? - Bharath On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, After changing the BLOCKSIZE to 16KB, the latency has reduced a little. Now the average is around 75ms. Overall throughput (I am using 40 Clients to fetch records) is around 1K OPS. After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97 in my 8 RS respectively. Thanks, Ramu On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com wrote: Thanks Lars. I have changed the BLOCKSIZE to 16KB and triggered a major compaction. I will report my results once it is done. - Ramu On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org wrote: First of: 128gb heap per RegionServer. Wow.I'd be interested to hear your experience with such a large heap for your RS. It's definitely big enough. It's interesting hat 100gb do fit into the aggregate cache (of 8x32gb), while 1.8tb do not. Looks like ~70% of the read request would need to bring in a 64kb block in order to read 724 bytes. Should that take 100ms? No. Something's still amiss. Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k to read the small row). You would need to issue a major compaction for that to take effect. Maybe try 16k blocks. If that speeds up your random gets we know where to look next... At the disk IO. -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Sunday, October 6, 2013 11:05 PM Subject: Re: HBase Random Read latency 100ms Lars, In one of your old posts, you had mentioned that lowering the BLOCKSIZE is good for random reads (of course with increased size for Block Indexes). Post is at http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow Will that help in my tests? Should I give it a try? If I alter my table, should I trigger a major compaction again for this to take effect? Thanks, Ramu On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S ramu.ma...@gmail.com wrote: Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB. {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING
RE: HBase Random Read latency 100ms
Ramu, your HBase configuration (128GB of heap) is far from optimal. Nobody runs HBase with that amount of heap to my best knowledge. 32GB of RAM is the usual upper limit. We run 8-12GB in production. What else, your IO capacity is VERY low. 2 SATA drives in RAID 1 for mostly random reads load? You should have 8, better 12-16 drives per server. Forget about RAID. You have HDFS. Block cache in your case does not help much , as since your read amplification is at least x20 (16KB block and 724 B read) - its just waste RAM (heap). In your case you do not need LARGE heap and LARGE block cache. I advise you reconsidering your hardware spec, applying all optimizations mentioned already in this thread and lowering your expectations. With a right hardware you will be able to get 500-1000 truly random reads per server. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Monday, October 07, 2013 5:23 AM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi Bharath, I am little confused about the metrics displayed by Cloudera. Even when there are no oeprations, the gc_time metric is showing 2s constant in the graph. Is this the CMS gc_time (in that case no JVm pause) or the GC pause. GC timings reported earlier is the average taken for gc_time metric for all region servers. Regards, Ramu On Mon, Oct 7, 2013 at 9:10 PM, Ramu M S ramu.ma...@gmail.com wrote: Jean, Yes. It is 2 drives. - Ramu On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Quick questionon the disk side. When you say: 800 GB SATA (7200 RPM) Disk Is it 1x800GB? It's raid 1, so might be 2 drives? What's the configuration? JM 2013/10/7 Ramu M S ramu.ma...@gmail.com Lars, Bharath, Compression is disabled for the table. This was not intended from the evaluation. I forgot to mention that during table creation. I will enable snappy and do major compaction again. Please suggest other options to try out and also suggestions for the previous questions. Thanks, Ramu On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com wrote: Bharath, I was about to report this. Yes indeed there is too much of GC time. Just verified the GC time using Cloudera Manager statistics(Every minute update). For each Region Server, - During Read: Graph shows 2s constant. - During Compaction: Graph starts with 7s and goes as high as 20s during end. Few more questions, 1. For the current evaluation, since the reads are completely random and I don't expect to read same data again can I set the Heap to the default 1 GB ? 2. Can I completely turn off BLOCK CACHE for this table? http://hbase.apache.org/book/regionserver.arch.html recommends that for Randm reads. 3. But in the next phase of evaluation, We are interested to use HBase as In-memory KV DB by having the latest data in RAM (To the tune of around 128 GB in each RS, we are setting up 50-100 Node Cluster). I am very curious to hear any suggestions in this regard. Regards, Ramu On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada bhara...@cloudera.com wrote: Hi Ramu, Thanks for reporting the results back. Just curious if you are hitting any big GC pauses due to block cache churn on such large heap. Do you see it ? - Bharath On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, After changing the BLOCKSIZE to 16KB, the latency has reduced a little. Now the average is around 75ms. Overall throughput (I am using 40 Clients to fetch records) is around 1K OPS. After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97 in my 8 RS respectively. Thanks, Ramu On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S ramu.ma...@gmail.com wrote: Thanks Lars. I have changed the BLOCKSIZE to 16KB and triggered a major compaction. I will report my results once it is done. - Ramu On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl la...@apache.org wrote: First of: 128gb heap per RegionServer. Wow.I'd be interested to hear your experience with such a large heap for your RS. It's definitely big enough. It's interesting hat 100gb do fit into the aggregate cache (of 8x32gb), while 1.8tb do not. Looks like ~70% of the read request would need to bring in a 64kb block in order to read 724 bytes. Should that take 100ms? No. Something's still amiss. Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k to read the small row). You would need to issue a major compaction
Re: HBase Random Read latency 100ms
Vladimir, Yes. I am fully aware of the HDD limitation and wrong configurations wrt RAID. Unfortunately, the hardware is leased from others for this work and I wasn't consulted to decide the h/w specification for the tests that I am doing now. Even the RAID cannot be turned off or set to RAID-0 Production system is according to the Hadoop needs (100 Nodes with 16 Core CPU, 192 GB RAM, 24 X 600GB SAS Drives, RAID cannot be completely turned off, so we are creating 1 Virtual Disk containing only 1 Physical Disk and the VD RAID level set to* *RAID-0). These systems are still not available. If you have any suggestion on the production setup, I will be glad to hear. Also, as pointed out earlier, we are planning to use HBase also as an in memory KV store to access the latest data. That's why RAM was considered huge in this configuration. But looks like we would run into more problems than any gains from this. Keeping that aside, I was trying to get the maximum out of the current cluster or as you said Is 500-1000 OPS the max I could get out of this setup? Regards, Ramu On Tue, Oct 8, 2013 at 3:02 AM, Vladimir Rodionov vrodio...@carrieriq.comwrote: Ramu, your HBase configuration (128GB of heap) is far from optimal. Nobody runs HBase with that amount of heap to my best knowledge. 32GB of RAM is the usual upper limit. We run 8-12GB in production. What else, your IO capacity is VERY low. 2 SATA drives in RAID 1 for mostly random reads load? You should have 8, better 12-16 drives per server. Forget about RAID. You have HDFS. Block cache in your case does not help much , as since your read amplification is at least x20 (16KB block and 724 B read) - its just waste RAM (heap). In your case you do not need LARGE heap and LARGE block cache. I advise you reconsidering your hardware spec, applying all optimizations mentioned already in this thread and lowering your expectations. With a right hardware you will be able to get 500-1000 truly random reads per server. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Monday, October 07, 2013 5:23 AM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Hi Bharath, I am little confused about the metrics displayed by Cloudera. Even when there are no oeprations, the gc_time metric is showing 2s constant in the graph. Is this the CMS gc_time (in that case no JVm pause) or the GC pause. GC timings reported earlier is the average taken for gc_time metric for all region servers. Regards, Ramu On Mon, Oct 7, 2013 at 9:10 PM, Ramu M S ramu.ma...@gmail.com wrote: Jean, Yes. It is 2 drives. - Ramu On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Quick questionon the disk side. When you say: 800 GB SATA (7200 RPM) Disk Is it 1x800GB? It's raid 1, so might be 2 drives? What's the configuration? JM 2013/10/7 Ramu M S ramu.ma...@gmail.com Lars, Bharath, Compression is disabled for the table. This was not intended from the evaluation. I forgot to mention that during table creation. I will enable snappy and do major compaction again. Please suggest other options to try out and also suggestions for the previous questions. Thanks, Ramu On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S ramu.ma...@gmail.com wrote: Bharath, I was about to report this. Yes indeed there is too much of GC time. Just verified the GC time using Cloudera Manager statistics(Every minute update). For each Region Server, - During Read: Graph shows 2s constant. - During Compaction: Graph starts with 7s and goes as high as 20s during end. Few more questions, 1. For the current evaluation, since the reads are completely random and I don't expect to read same data again can I set the Heap to the default 1 GB ? 2. Can I completely turn off BLOCK CACHE for this table? http://hbase.apache.org/book/regionserver.arch.html recommends that for Randm reads. 3. But in the next phase of evaluation, We are interested to use HBase as In-memory KV DB by having the latest data in RAM (To the tune of around 128 GB in each RS, we are setting up 50-100 Node Cluster). I am very curious to hear any suggestions in this regard. Regards, Ramu On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada bhara...@cloudera.com wrote: Hi Ramu, Thanks for reporting the results back. Just curious if you are hitting any big GC pauses due to block cache churn on such large heap. Do you see it ? - Bharath On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, After changing
RE: HBase Random Read latency 100ms
Ramu, If your working set of data fits into 192GB you may get additional boost by utilizing OS page cache, or wait until 0.98 release which introduces new bucket cache implementation (port of Facebook L2 cache). You can try vanilla bucket cache in 0.96 (not released yet but is due soon). Both caches stores data off-heap, but Facebook version can store encoded and compressed data and vanilla bucket cache does not. There are some options how to utilize efficiently available RAM (at least in upcoming HBase releases) . If your data set does not fit RAM then your only hope is your 24 SAS drives. Depending on your RAID settings, disk IO perf, HDFS configuration (I think the latest Hadoop is preferable here). OS page cache is most vulnerable and volatile, it can not be controlled and can be easily polluted by either some other processes or by HBase itself (long scan). With Block cache you have more control but the first truly usable *official* implementation is going to be a part of 0.98 release. As far as I understand, your use case would definitely covered by something similar to BigTable ScanCache (RowCache) , but there is no such cache in HBase yet. One major advantage of RowCache vs BlockCache (apart from being much more efficient in RAM usage) is resilience to Region compactions. Each minor Region compaction invalidates partially Region's data in BlockCache and major compaction invalidates this Region's data completely. This is not the case with RowCache (would it be implemented). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Monday, October 07, 2013 5:25 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Vladimir, Yes. I am fully aware of the HDD limitation and wrong configurations wrt RAID. Unfortunately, the hardware is leased from others for this work and I wasn't consulted to decide the h/w specification for the tests that I am doing now. Even the RAID cannot be turned off or set to RAID-0 Production system is according to the Hadoop needs (100 Nodes with 16 Core CPU, 192 GB RAM, 24 X 600GB SAS Drives, RAID cannot be completely turned off, so we are creating 1 Virtual Disk containing only 1 Physical Disk and the VD RAID level set to* *RAID-0). These systems are still not available. If you have any suggestion on the production setup, I will be glad to hear. Also, as pointed out earlier, we are planning to use HBase also as an in memory KV store to access the latest data. That's why RAM was considered huge in this configuration. But looks like we would run into more problems than any gains from this. Keeping that aside, I was trying to get the maximum out of the current cluster or as you said Is 500-1000 OPS the max I could get out of this setup? Regards, Ramu Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments.
Re: HBase Random Read latency 100ms
Vladimir, Thanks for the Insights into Future Caching features. Looks very interesting. - Ramu On Tue, Oct 8, 2013 at 10:45 AM, Vladimir Rodionov vrodio...@carrieriq.comwrote: Ramu, If your working set of data fits into 192GB you may get additional boost by utilizing OS page cache, or wait until 0.98 release which introduces new bucket cache implementation (port of Facebook L2 cache). You can try vanilla bucket cache in 0.96 (not released yet but is due soon). Both caches stores data off-heap, but Facebook version can store encoded and compressed data and vanilla bucket cache does not. There are some options how to utilize efficiently available RAM (at least in upcoming HBase releases) . If your data set does not fit RAM then your only hope is your 24 SAS drives. Depending on your RAID settings, disk IO perf, HDFS configuration (I think the latest Hadoop is preferable here). OS page cache is most vulnerable and volatile, it can not be controlled and can be easily polluted by either some other processes or by HBase itself (long scan). With Block cache you have more control but the first truly usable *official* implementation is going to be a part of 0.98 release. As far as I understand, your use case would definitely covered by something similar to BigTable ScanCache (RowCache) , but there is no such cache in HBase yet. One major advantage of RowCache vs BlockCache (apart from being much more efficient in RAM usage) is resilience to Region compactions. Each minor Region compaction invalidates partially Region's data in BlockCache and major compaction invalidates this Region's data completely. This is not the case with RowCache (would it be implemented). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Monday, October 07, 2013 5:25 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Vladimir, Yes. I am fully aware of the HDD limitation and wrong configurations wrt RAID. Unfortunately, the hardware is leased from others for this work and I wasn't consulted to decide the h/w specification for the tests that I am doing now. Even the RAID cannot be turned off or set to RAID-0 Production system is according to the Hadoop needs (100 Nodes with 16 Core CPU, 192 GB RAM, 24 X 600GB SAS Drives, RAID cannot be completely turned off, so we are creating 1 Virtual Disk containing only 1 Physical Disk and the VD RAID level set to* *RAID-0). These systems are still not available. If you have any suggestion on the production setup, I will be glad to hear. Also, as pointed out earlier, we are planning to use HBase also as an in memory KV store to access the latest data. That's why RAM was considered huge in this configuration. But looks like we would run into more problems than any gains from this. Keeping that aside, I was trying to get the maximum out of the current cluster or as you said Is 500-1000 OPS the max I could get out of this setup? Regards, Ramu Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments.
Re: HBase Random Read latency 100ms
Hi All, Average Latency is still around 80ms. I have done the following, 1. Enabled Snappy Compression 2. Reduce the HFile size to 8 GB Should I attribute these results to bad Disk Configuration OR anything else to investigate? - Ramu On Tue, Oct 8, 2013 at 10:56 AM, Ramu M S ramu.ma...@gmail.com wrote: Vladimir, Thanks for the Insights into Future Caching features. Looks very interesting. - Ramu On Tue, Oct 8, 2013 at 10:45 AM, Vladimir Rodionov vrodio...@carrieriq.com wrote: Ramu, If your working set of data fits into 192GB you may get additional boost by utilizing OS page cache, or wait until 0.98 release which introduces new bucket cache implementation (port of Facebook L2 cache). You can try vanilla bucket cache in 0.96 (not released yet but is due soon). Both caches stores data off-heap, but Facebook version can store encoded and compressed data and vanilla bucket cache does not. There are some options how to utilize efficiently available RAM (at least in upcoming HBase releases) . If your data set does not fit RAM then your only hope is your 24 SAS drives. Depending on your RAID settings, disk IO perf, HDFS configuration (I think the latest Hadoop is preferable here). OS page cache is most vulnerable and volatile, it can not be controlled and can be easily polluted by either some other processes or by HBase itself (long scan). With Block cache you have more control but the first truly usable *official* implementation is going to be a part of 0.98 release. As far as I understand, your use case would definitely covered by something similar to BigTable ScanCache (RowCache) , but there is no such cache in HBase yet. One major advantage of RowCache vs BlockCache (apart from being much more efficient in RAM usage) is resilience to Region compactions. Each minor Region compaction invalidates partially Region's data in BlockCache and major compaction invalidates this Region's data completely. This is not the case with RowCache (would it be implemented). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ramu M S [ramu.ma...@gmail.com] Sent: Monday, October 07, 2013 5:25 PM To: user@hbase.apache.org Subject: Re: HBase Random Read latency 100ms Vladimir, Yes. I am fully aware of the HDD limitation and wrong configurations wrt RAID. Unfortunately, the hardware is leased from others for this work and I wasn't consulted to decide the h/w specification for the tests that I am doing now. Even the RAID cannot be turned off or set to RAID-0 Production system is according to the Hadoop needs (100 Nodes with 16 Core CPU, 192 GB RAM, 24 X 600GB SAS Drives, RAID cannot be completely turned off, so we are creating 1 Virtual Disk containing only 1 Physical Disk and the VD RAID level set to* *RAID-0). These systems are still not available. If you have any suggestion on the production setup, I will be glad to hear. Also, as pointed out earlier, we are planning to use HBase also as an in memory KV store to access the latest data. That's why RAM was considered huge in this configuration. But looks like we would run into more problems than any gains from this. Keeping that aside, I was trying to get the maximum out of the current cluster or as you said Is 500-1000 OPS the max I could get out of this setup? Regards, Ramu Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments.
HBase Random Read latency 100ms
Hi All, My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6). Each Region Server is with the following configuration, 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk (Unfortunately configured with RAID 1, can't change this as the Machines are leased temporarily for a month). I am running YCSB benchmark tests on HBase and currently inserting around 1.8 Billion records. (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record) Currently I am getting a write throughput of around 100K OPS, but random reads are very very slow, all gets have more than 100ms or more latency. I have changed the following default configuration, 1. HFile Size: 16GB 2. HDFS Block Size: 512 MB Total Data size is around 1.8 TB (Excluding the replicas). My Table is split into 128 Regions (No pre-splitting used, started with 1 and grew to 128 over the insertion time) Taking some inputs from earlier discussions I have done the following changes to disable Nagle (In both Client and Server hbase-site.xml, hdfs-site.xml) property namehbase.ipc.client.tcpnodelay/name valuetrue/value /property property nameipc.server.tcpnodelay/name valuetrue/value /property Ganglia stats shows large CPU IO wait (30% during reads). I agree that disk configuration is not ideal for Hadoop cluster, but as told earlier it can't change for now. I feel the latency is way beyond any reported results so far. Any pointers on what can be wrong? Thanks, Ramu
Re: HBase Random Read latency 100ms
Have you enabled short circuit reading? See here: http://hbase.apache.org/book/perf.hdfs.html How's your data locality (shown on the RegionServer UI page). How much memory are you giving your RegionServers? If you reads are truly random and the data set does not fit into the aggregate cache, you'll be dominated by the disk and network. Each read would need to bring in a 64k (default) HFile block. If short circuit reading is not enabled you'll get two or three context switches. So I would try: 1. Enable short circuit reading 2. Increase the block cache size per RegionServer 3. Decrease the HFile block size 4. Make sure your data is local (if it is not, issue a major compaction). -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org Sent: Sunday, October 6, 2013 10:01 PM Subject: HBase Random Read latency 100ms Hi All, My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6). Each Region Server is with the following configuration, 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk (Unfortunately configured with RAID 1, can't change this as the Machines are leased temporarily for a month). I am running YCSB benchmark tests on HBase and currently inserting around 1.8 Billion records. (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record) Currently I am getting a write throughput of around 100K OPS, but random reads are very very slow, all gets have more than 100ms or more latency. I have changed the following default configuration, 1. HFile Size: 16GB 2. HDFS Block Size: 512 MB Total Data size is around 1.8 TB (Excluding the replicas). My Table is split into 128 Regions (No pre-splitting used, started with 1 and grew to 128 over the insertion time) Taking some inputs from earlier discussions I have done the following changes to disable Nagle (In both Client and Server hbase-site.xml, hdfs-site.xml) property namehbase.ipc.client.tcpnodelay/name valuetrue/value /property property nameipc.server.tcpnodelay/name valuetrue/value /property Ganglia stats shows large CPU IO wait (30% during reads). I agree that disk configuration is not ideal for Hadoop cluster, but as told earlier it can't change for now. I feel the latency is way beyond any reported results so far. Any pointers on what can be wrong? Thanks, Ramu
Re: HBase Random Read latency 100ms
Adding to what Lars said, you can enable bloom filters on column families for read performance. On Mon, Oct 7, 2013 at 10:51 AM, lars hofhansl la...@apache.org wrote: Have you enabled short circuit reading? See here: http://hbase.apache.org/book/perf.hdfs.html How's your data locality (shown on the RegionServer UI page). How much memory are you giving your RegionServers? If you reads are truly random and the data set does not fit into the aggregate cache, you'll be dominated by the disk and network. Each read would need to bring in a 64k (default) HFile block. If short circuit reading is not enabled you'll get two or three context switches. So I would try: 1. Enable short circuit reading 2. Increase the block cache size per RegionServer 3. Decrease the HFile block size 4. Make sure your data is local (if it is not, issue a major compaction). -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org Sent: Sunday, October 6, 2013 10:01 PM Subject: HBase Random Read latency 100ms Hi All, My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6). Each Region Server is with the following configuration, 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk (Unfortunately configured with RAID 1, can't change this as the Machines are leased temporarily for a month). I am running YCSB benchmark tests on HBase and currently inserting around 1.8 Billion records. (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record) Currently I am getting a write throughput of around 100K OPS, but random reads are very very slow, all gets have more than 100ms or more latency. I have changed the following default configuration, 1. HFile Size: 16GB 2. HDFS Block Size: 512 MB Total Data size is around 1.8 TB (Excluding the replicas). My Table is split into 128 Regions (No pre-splitting used, started with 1 and grew to 128 over the insertion time) Taking some inputs from earlier discussions I have done the following changes to disable Nagle (In both Client and Server hbase-site.xml, hdfs-site.xml) property namehbase.ipc.client.tcpnodelay/name valuetrue/value /property property nameipc.server.tcpnodelay/name valuetrue/value /property Ganglia stats shows large CPU IO wait (30% during reads). I agree that disk configuration is not ideal for Hadoop cluster, but as told earlier it can't change for now. I feel the latency is way beyond any reported results so far. Any pointers on what can be wrong? Thanks, Ramu -- Bharath Vissapragada http://www.cloudera.com
Re: HBase Random Read latency 100ms
Lars, - Yes Short Circuit reading is enabled on both HDFS and HBase. - I had issued Major compaction after table is loaded. - Region Servers have max heap set as 128 GB. Block Cache Size is 0.25 of heap (So 32 GB for each Region Server) Do we need even more? - Decreasing HFile Size (Default is 1GB )? Should I leave it to default? - Keys are Zipfian distributed (By YCSB) Bharath, Bloom Filters are enabled. Here is my table details, {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '16384', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} When the data size is around 100GB (100 Million records), then the latency is very good. I am getting a throughput of around 300K OPS. In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk reads are around 50-60 MB/s throughout the read cycle. Thanks, Ramu On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl la...@apache.org wrote: Have you enabled short circuit reading? See here: http://hbase.apache.org/book/perf.hdfs.html How's your data locality (shown on the RegionServer UI page). How much memory are you giving your RegionServers? If you reads are truly random and the data set does not fit into the aggregate cache, you'll be dominated by the disk and network. Each read would need to bring in a 64k (default) HFile block. If short circuit reading is not enabled you'll get two or three context switches. So I would try: 1. Enable short circuit reading 2. Increase the block cache size per RegionServer 3. Decrease the HFile block size 4. Make sure your data is local (if it is not, issue a major compaction). -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org Sent: Sunday, October 6, 2013 10:01 PM Subject: HBase Random Read latency 100ms Hi All, My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6). Each Region Server is with the following configuration, 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk (Unfortunately configured with RAID 1, can't change this as the Machines are leased temporarily for a month). I am running YCSB benchmark tests on HBase and currently inserting around 1.8 Billion records. (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record) Currently I am getting a write throughput of around 100K OPS, but random reads are very very slow, all gets have more than 100ms or more latency. I have changed the following default configuration, 1. HFile Size: 16GB 2. HDFS Block Size: 512 MB Total Data size is around 1.8 TB (Excluding the replicas). My Table is split into 128 Regions (No pre-splitting used, started with 1 and grew to 128 over the insertion time) Taking some inputs from earlier discussions I have done the following changes to disable Nagle (In both Client and Server hbase-site.xml, hdfs-site.xml) property namehbase.ipc.client.tcpnodelay/name valuetrue/value /property property nameipc.server.tcpnodelay/name valuetrue/value /property Ganglia stats shows large CPU IO wait (30% during reads). I agree that disk configuration is not ideal for Hadoop cluster, but as told earlier it can't change for now. I feel the latency is way beyond any reported results so far. Any pointers on what can be wrong? Thanks, Ramu
Re: HBase Random Read latency 100ms
Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB. {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} Thanks, Ramu On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S ramu.ma...@gmail.com wrote: Lars, - Yes Short Circuit reading is enabled on both HDFS and HBase. - I had issued Major compaction after table is loaded. - Region Servers have max heap set as 128 GB. Block Cache Size is 0.25 of heap (So 32 GB for each Region Server) Do we need even more? - Decreasing HFile Size (Default is 1GB )? Should I leave it to default? - Keys are Zipfian distributed (By YCSB) Bharath, Bloom Filters are enabled. Here is my table details, {NAME = 'usertable', FAMILIES = [{NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROWCOL', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '16384', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} When the data size is around 100GB (100 Million records), then the latency is very good. I am getting a throughput of around 300K OPS. In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk reads are around 50-60 MB/s throughout the read cycle. Thanks, Ramu On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl la...@apache.org wrote: Have you enabled short circuit reading? See here: http://hbase.apache.org/book/perf.hdfs.html How's your data locality (shown on the RegionServer UI page). How much memory are you giving your RegionServers? If you reads are truly random and the data set does not fit into the aggregate cache, you'll be dominated by the disk and network. Each read would need to bring in a 64k (default) HFile block. If short circuit reading is not enabled you'll get two or three context switches. So I would try: 1. Enable short circuit reading 2. Increase the block cache size per RegionServer 3. Decrease the HFile block size 4. Make sure your data is local (if it is not, issue a major compaction). -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org Sent: Sunday, October 6, 2013 10:01 PM Subject: HBase Random Read latency 100ms Hi All, My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6). Each Region Server is with the following configuration, 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk (Unfortunately configured with RAID 1, can't change this as the Machines are leased temporarily for a month). I am running YCSB benchmark tests on HBase and currently inserting around 1.8 Billion records. (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record) Currently I am getting a write throughput of around 100K OPS, but random reads are very very slow, all gets have more than 100ms or more latency. I have changed the following default configuration, 1. HFile Size: 16GB 2. HDFS Block Size: 512 MB Total Data size is around 1.8 TB (Excluding the replicas). My Table is split into 128 Regions (No pre-splitting used, started with 1 and grew to 128 over the insertion time) Taking some inputs from earlier discussions I have done the following changes to disable Nagle (In both Client and Server hbase-site.xml, hdfs-site.xml) property namehbase.ipc.client.tcpnodelay/name valuetrue/value /property property nameipc.server.tcpnodelay/name valuetrue/value /property Ganglia stats shows large CPU IO wait (30% during reads). I agree that disk configuration is not ideal for Hadoop cluster, but as told earlier it can't change for now. I feel the latency is way beyond any reported results so far. Any pointers on what can be wrong? Thanks, Ramu