Re: HBase - Performance issue

2014-09-09 Thread Michael Segel

So you have large RS and you have large regions. Your regions are huge relative 
to your RS memory heap. 
(Not ideal.) 

You have slow drives (5400rpm) and you have 1GbE network. 
Do didn’t say how many drives per server. 

Under load, you will saturate your network with just 4 drives. (Give or take. 
Never tried 5400 RPM drives)
So you hit one bandwidth bottleneck there. 
The other is the ratio of spindles to CPU.  So if you have 4 drives and 8 
cores… again under load, you’ll start to see 
an I/O bottleneck … 

On average, how many regions do you have per table per server? 

I’d consider shrinking your regions.

Sometimes you need to dial back from 11 do a more reasonable listening level… 
;-) 

HTH

-Mike



On Sep 8, 2014, at 8:23 AM, kiran kiran.sarvabho...@gmail.com wrote:

 Hi Lars,
 
 Ours is a problem of I/O wait and network bandwidth increase around the
 same time
 
 Lars,
 
 Sorry to say this... our's is a production cluster and we ideally should
 never want a downtime... Also lars, we had very miserable experience while
 upgrading from 0.92 to 0.94... There was a never a mention of change in
 split policy in the release notes... and the policy was not ideal for our
 cluster and it took us atleast a week to figure out that
 
 Our cluster runs on commodity hardware with big regions (5-10gb)... Region
 sever mem is 10gb...
 2TB SATA Hard disks (5400 - 7200 rpm)... Internal network bandwidth is 1 gig
 
 So please suggest us any work around with 0.94.1
 
 
 On Sun, Sep 7, 2014 at 8:42 AM, lars hofhansl la...@apache.org wrote:
 
 Thinking about it again, if you ran into a HBASE-7336 you'd see high CPU
 load, but *not* IOWAIT.
 0.94 is at 0.94.23, you should upgrade. A lot of fixes, improvements, and
 performance enhancements went in since 0.94.4.
 You can do a rolling upgrade straight to 0.94.23.
 
 With that out of the way, can you post a jstack of the processes that
 experience high wait times?
 
 -- Lars
 
  --
 *From:* kiran kiran.sarvabho...@gmail.com
 *To:* user@hbase.apache.org; lars hofhansl la...@apache.org
 *Sent:* Saturday, September 6, 2014 11:30 AM
 *Subject:* Re: HBase - Performance issue
 
 Lars,
 
 We are facing a similar situation on the similar cluster configuration...
 We are having high I/O wait percentages on some machines in our cluster...
 We have short circuit reads enabled but still we are facing the similar
 problem.. the cpu wait goes upto 50% also in some case while issuing scan
 commands with multiple threads.. Is there a work around other than applying
 the patch for 0.94.4 ??
 
 Thanks
 Kiran
 
 
 On Thu, Apr 25, 2013 at 12:12 AM, lars hofhansl la...@apache.org wrote:
 
 You may have run into https://issues.apache.org/jira/browse/HBASE-7336
 (which is in 0.94.4)
 (Although I had not observed this effect as much when short circuit reads
 are enabled)
 
 
 
 - Original Message -
 From: kzurek kzu...@proximetry.pl
 To: user@hbase.apache.org
 Cc:
 Sent: Wednesday, April 24, 2013 3:12 AM
 Subject: HBase - Performance issue
 
 The problem is that when I'm putting my data (multithreaded client, ~30MB/s
 traffic outgoing) into the cluster the load is equally spread over all
 RegionServer with 3.5% average CPU wait time (average CPU user: 51%). When
 I've added similar, mutlithreaded client that Scans for, let say, 100 last
 samples of randomly generated key from chosen time range, I'm getting high
 CPU wait time (20% and up) on two (or more if there is higher number of
 threads, default 10) random RegionServers. Therefore, machines that held
 those RS are getting very hot - one of the consequences is that number of
 store file is constantly increasing, up to the maximum limit. Rest of the
 RS
 are having 10-12% CPU wait time and everything seems to be OK (number of
 store files varies so they are being compacted and not increasing over
 time). Any ideas? Maybe  I could prioritize writes over reads somehow? Is
 it
 possible? If so what would be the best way to that and where it should be
 placed - on the client or cluster side)?
 
 Cluster specification:
 HBase Version0.94.2-cdh4.2.0
 Hadoop Version2.0.0-cdh4.2.0
 There are 6xDataNodes (5xHDD for storing data), 1xMasterNodes
 Other settings:
 - Bloom filters (ROWCOL) set
 - Short circuit turned on
 - HDFS Block Size: 128MB
 - Java Heap Size of Namenode/Secondary Namenode in Bytes: 8 GiB
 - Java Heap Size of HBase RegionServer in Bytes: 12 GiB
 - Java Heap Size of HBase Master in Bytes: 4 GiB
 - Java Heap Size of DataNode in Bytes: 1 GiB (default)
 Number of regions per RegionServer: 19 (total 114 regions on 6 RS)
 Key design: UUIDTIMESTAMP - UUID: 1-10M, TIMESTAMP: 1-N
 Table design: 1 column family with 20 columns of 8 bytes
 
 Get client:
 Multiple threads
 Each thread have its own tables instance with their Scanner.
 Each thread have its own range of UUIDs and randomly draws beginning of
 time
 range to build rowkey properly (see above).
 Each time Scan requests same

Re: HBase - Performance issue

2014-09-08 Thread kiran
Hi Lars,

Ours is a problem of I/O wait and network bandwidth increase around the
same time

Lars,

Sorry to say this... our's is a production cluster and we ideally should
never want a downtime... Also lars, we had very miserable experience while
upgrading from 0.92 to 0.94... There was a never a mention of change in
split policy in the release notes... and the policy was not ideal for our
cluster and it took us atleast a week to figure out that

Our cluster runs on commodity hardware with big regions (5-10gb)... Region
sever mem is 10gb...
2TB SATA Hard disks (5400 - 7200 rpm)... Internal network bandwidth is 1 gig

So please suggest us any work around with 0.94.1


On Sun, Sep 7, 2014 at 8:42 AM, lars hofhansl la...@apache.org wrote:

 Thinking about it again, if you ran into a HBASE-7336 you'd see high CPU
 load, but *not* IOWAIT.
 0.94 is at 0.94.23, you should upgrade. A lot of fixes, improvements, and
 performance enhancements went in since 0.94.4.
 You can do a rolling upgrade straight to 0.94.23.

 With that out of the way, can you post a jstack of the processes that
 experience high wait times?

 -- Lars

   --
  *From:* kiran kiran.sarvabho...@gmail.com
 *To:* user@hbase.apache.org; lars hofhansl la...@apache.org
 *Sent:* Saturday, September 6, 2014 11:30 AM
 *Subject:* Re: HBase - Performance issue

 Lars,

 We are facing a similar situation on the similar cluster configuration...
 We are having high I/O wait percentages on some machines in our cluster...
 We have short circuit reads enabled but still we are facing the similar
 problem.. the cpu wait goes upto 50% also in some case while issuing scan
 commands with multiple threads.. Is there a work around other than applying
 the patch for 0.94.4 ??

 Thanks
 Kiran


 On Thu, Apr 25, 2013 at 12:12 AM, lars hofhansl la...@apache.org wrote:

 You may have run into https://issues.apache.org/jira/browse/HBASE-7336
 (which is in 0.94.4)
 (Although I had not observed this effect as much when short circuit reads
 are enabled)



 - Original Message -
 From: kzurek kzu...@proximetry.pl
 To: user@hbase.apache.org
 Cc:
 Sent: Wednesday, April 24, 2013 3:12 AM
 Subject: HBase - Performance issue

 The problem is that when I'm putting my data (multithreaded client, ~30MB/s
 traffic outgoing) into the cluster the load is equally spread over all
 RegionServer with 3.5% average CPU wait time (average CPU user: 51%). When
 I've added similar, mutlithreaded client that Scans for, let say, 100 last
 samples of randomly generated key from chosen time range, I'm getting high
 CPU wait time (20% and up) on two (or more if there is higher number of
 threads, default 10) random RegionServers. Therefore, machines that held
 those RS are getting very hot - one of the consequences is that number of
 store file is constantly increasing, up to the maximum limit. Rest of the
 RS
 are having 10-12% CPU wait time and everything seems to be OK (number of
 store files varies so they are being compacted and not increasing over
 time). Any ideas? Maybe  I could prioritize writes over reads somehow? Is
 it
 possible? If so what would be the best way to that and where it should be
 placed - on the client or cluster side)?

 Cluster specification:
 HBase Version0.94.2-cdh4.2.0
 Hadoop Version2.0.0-cdh4.2.0
 There are 6xDataNodes (5xHDD for storing data), 1xMasterNodes
 Other settings:
 - Bloom filters (ROWCOL) set
 - Short circuit turned on
 - HDFS Block Size: 128MB
 - Java Heap Size of Namenode/Secondary Namenode in Bytes: 8 GiB
 - Java Heap Size of HBase RegionServer in Bytes: 12 GiB
 - Java Heap Size of HBase Master in Bytes: 4 GiB
 - Java Heap Size of DataNode in Bytes: 1 GiB (default)
 Number of regions per RegionServer: 19 (total 114 regions on 6 RS)
 Key design: UUIDTIMESTAMP - UUID: 1-10M, TIMESTAMP: 1-N
 Table design: 1 column family with 20 columns of 8 bytes

 Get client:
 Multiple threads
 Each thread have its own tables instance with their Scanner.
 Each thread have its own range of UUIDs and randomly draws beginning of
 time
 range to build rowkey properly (see above).
 Each time Scan requests same amount of rows, but with random rowkey.





 --
 View this message in context:
 http://apache-hbase.679495.n3.nabble.com/HBase-Performance-issue-tp4042836.html
 Sent from the HBase User mailing list archive at Nabble.com.




 --
 Thank you
 Kiran Sarvabhotla

 -Even a correct decision is wrong when it is taken late






-- 
Thank you
Kiran Sarvabhotla

-Even a correct decision is wrong when it is taken late


Re: HBase - Performance issue

2014-09-08 Thread kiran
We have this setting enabled also...

property
namedfs.client.read.shortcircuit/name
valuetrue/value
/property

On Mon, Sep 8, 2014 at 12:53 PM, kiran kiran.sarvabho...@gmail.com wrote:

 Hi Lars,

 Ours is a problem of I/O wait and network bandwidth increase around the
 same time

 Lars,

 Sorry to say this... our's is a production cluster and we ideally should
 never want a downtime... Also lars, we had very miserable experience while
 upgrading from 0.92 to 0.94... There was a never a mention of change in
 split policy in the release notes... and the policy was not ideal for our
 cluster and it took us atleast a week to figure out that

 Our cluster runs on commodity hardware with big regions (5-10gb)... Region
 sever mem is 10gb...
 2TB SATA Hard disks (5400 - 7200 rpm)... Internal network bandwidth is 1
 gig

 So please suggest us any work around with 0.94.1


 On Sun, Sep 7, 2014 at 8:42 AM, lars hofhansl la...@apache.org wrote:

 Thinking about it again, if you ran into a HBASE-7336 you'd see high CPU
 load, but *not* IOWAIT.
 0.94 is at 0.94.23, you should upgrade. A lot of fixes, improvements, and
 performance enhancements went in since 0.94.4.
 You can do a rolling upgrade straight to 0.94.23.

 With that out of the way, can you post a jstack of the processes that
 experience high wait times?

 -- Lars

   --
  *From:* kiran kiran.sarvabho...@gmail.com
 *To:* user@hbase.apache.org; lars hofhansl la...@apache.org
 *Sent:* Saturday, September 6, 2014 11:30 AM
 *Subject:* Re: HBase - Performance issue

 Lars,

 We are facing a similar situation on the similar cluster configuration...
 We are having high I/O wait percentages on some machines in our cluster...
 We have short circuit reads enabled but still we are facing the similar
 problem.. the cpu wait goes upto 50% also in some case while issuing scan
 commands with multiple threads.. Is there a work around other than applying
 the patch for 0.94.4 ??

 Thanks
 Kiran


 On Thu, Apr 25, 2013 at 12:12 AM, lars hofhansl la...@apache.org wrote:

 You may have run into https://issues.apache.org/jira/browse/HBASE-7336
 (which is in 0.94.4)
 (Although I had not observed this effect as much when short circuit reads
 are enabled)



 - Original Message -
 From: kzurek kzu...@proximetry.pl
 To: user@hbase.apache.org
 Cc:
 Sent: Wednesday, April 24, 2013 3:12 AM
 Subject: HBase - Performance issue

 The problem is that when I'm putting my data (multithreaded client,
 ~30MB/s
 traffic outgoing) into the cluster the load is equally spread over all
 RegionServer with 3.5% average CPU wait time (average CPU user: 51%). When
 I've added similar, mutlithreaded client that Scans for, let say, 100 last
 samples of randomly generated key from chosen time range, I'm getting high
 CPU wait time (20% and up) on two (or more if there is higher number of
 threads, default 10) random RegionServers. Therefore, machines that held
 those RS are getting very hot - one of the consequences is that number of
 store file is constantly increasing, up to the maximum limit. Rest of the
 RS
 are having 10-12% CPU wait time and everything seems to be OK (number of
 store files varies so they are being compacted and not increasing over
 time). Any ideas? Maybe  I could prioritize writes over reads somehow? Is
 it
 possible? If so what would be the best way to that and where it should be
 placed - on the client or cluster side)?

 Cluster specification:
 HBase Version0.94.2-cdh4.2.0
 Hadoop Version2.0.0-cdh4.2.0
 There are 6xDataNodes (5xHDD for storing data), 1xMasterNodes
 Other settings:
 - Bloom filters (ROWCOL) set
 - Short circuit turned on
 - HDFS Block Size: 128MB
 - Java Heap Size of Namenode/Secondary Namenode in Bytes: 8 GiB
 - Java Heap Size of HBase RegionServer in Bytes: 12 GiB
 - Java Heap Size of HBase Master in Bytes: 4 GiB
 - Java Heap Size of DataNode in Bytes: 1 GiB (default)
 Number of regions per RegionServer: 19 (total 114 regions on 6 RS)
 Key design: UUIDTIMESTAMP - UUID: 1-10M, TIMESTAMP: 1-N
 Table design: 1 column family with 20 columns of 8 bytes

 Get client:
 Multiple threads
 Each thread have its own tables instance with their Scanner.
 Each thread have its own range of UUIDs and randomly draws beginning of
 time
 range to build rowkey properly (see above).
 Each time Scan requests same amount of rows, but with random rowkey.





 --
 View this message in context:
 http://apache-hbase.679495.n3.nabble.com/HBase-Performance-issue-tp4042836.html
 Sent from the HBase User mailing list archive at Nabble.com.




 --
 Thank you
 Kiran Sarvabhotla

 -Even a correct decision is wrong when it is taken late






 --
 Thank you
 Kiran Sarvabhotla

 -Even a correct decision is wrong when it is taken late




-- 
Thank you
Kiran Sarvabhotla

-Even a correct decision is wrong when it is taken late


Re: HBase - Performance issue

2014-09-08 Thread Andrew Purtell
What about providing the jstack as Lars suggested? That doesn't
require you to upgrade (yet)

0.94.23 is the same major version as 0.94.1. Upgrading to this version
is not the same process as a major upgrade from 0.92 to 0.94. Changes
like the split policy difference you mention don't happen in point
releases.  You should consider upgrading to the latest 0.94.x, if not
now than at some point, because a volunteer open source community
really can only support the latest release of a major version. You can
insist on working with a (now, very old) release, but we might not be
able to help you much.


On Mon, Sep 8, 2014 at 12:23 AM, kiran kiran.sarvabho...@gmail.com wrote:
 Hi Lars,

 Ours is a problem of I/O wait and network bandwidth increase around the
 same time

 Lars,

 Sorry to say this... our's is a production cluster and we ideally should
 never want a downtime... Also lars, we had very miserable experience while
 upgrading from 0.92 to 0.94... There was a never a mention of change in
 split policy in the release notes... and the policy was not ideal for our
 cluster and it took us atleast a week to figure out that

 Our cluster runs on commodity hardware with big regions (5-10gb)... Region
 sever mem is 10gb...
 2TB SATA Hard disks (5400 - 7200 rpm)... Internal network bandwidth is 1 gig

 So please suggest us any work around with 0.94.1


 On Sun, Sep 7, 2014 at 8:42 AM, lars hofhansl la...@apache.org wrote:

 Thinking about it again, if you ran into a HBASE-7336 you'd see high CPU
 load, but *not* IOWAIT.
 0.94 is at 0.94.23, you should upgrade. A lot of fixes, improvements, and
 performance enhancements went in since 0.94.4.
 You can do a rolling upgrade straight to 0.94.23.

 With that out of the way, can you post a jstack of the processes that
 experience high wait times?

 -- Lars

   --
  *From:* kiran kiran.sarvabho...@gmail.com
 *To:* user@hbase.apache.org; lars hofhansl la...@apache.org
 *Sent:* Saturday, September 6, 2014 11:30 AM
 *Subject:* Re: HBase - Performance issue

 Lars,

 We are facing a similar situation on the similar cluster configuration...
 We are having high I/O wait percentages on some machines in our cluster...
 We have short circuit reads enabled but still we are facing the similar
 problem.. the cpu wait goes upto 50% also in some case while issuing scan
 commands with multiple threads.. Is there a work around other than applying
 the patch for 0.94.4 ??

 Thanks
 Kiran


 On Thu, Apr 25, 2013 at 12:12 AM, lars hofhansl la...@apache.org wrote:

 You may have run into https://issues.apache.org/jira/browse/HBASE-7336
 (which is in 0.94.4)
 (Although I had not observed this effect as much when short circuit reads
 are enabled)



 - Original Message -
 From: kzurek kzu...@proximetry.pl
 To: user@hbase.apache.org
 Cc:
 Sent: Wednesday, April 24, 2013 3:12 AM
 Subject: HBase - Performance issue

 The problem is that when I'm putting my data (multithreaded client, ~30MB/s
 traffic outgoing) into the cluster the load is equally spread over all
 RegionServer with 3.5% average CPU wait time (average CPU user: 51%). When
 I've added similar, mutlithreaded client that Scans for, let say, 100 last
 samples of randomly generated key from chosen time range, I'm getting high
 CPU wait time (20% and up) on two (or more if there is higher number of
 threads, default 10) random RegionServers. Therefore, machines that held
 those RS are getting very hot - one of the consequences is that number of
 store file is constantly increasing, up to the maximum limit. Rest of the
 RS
 are having 10-12% CPU wait time and everything seems to be OK (number of
 store files varies so they are being compacted and not increasing over
 time). Any ideas? Maybe  I could prioritize writes over reads somehow? Is
 it
 possible? If so what would be the best way to that and where it should be
 placed - on the client or cluster side)?

 Cluster specification:
 HBase Version0.94.2-cdh4.2.0
 Hadoop Version2.0.0-cdh4.2.0
 There are 6xDataNodes (5xHDD for storing data), 1xMasterNodes
 Other settings:
 - Bloom filters (ROWCOL) set
 - Short circuit turned on
 - HDFS Block Size: 128MB
 - Java Heap Size of Namenode/Secondary Namenode in Bytes: 8 GiB
 - Java Heap Size of HBase RegionServer in Bytes: 12 GiB
 - Java Heap Size of HBase Master in Bytes: 4 GiB
 - Java Heap Size of DataNode in Bytes: 1 GiB (default)
 Number of regions per RegionServer: 19 (total 114 regions on 6 RS)
 Key design: UUIDTIMESTAMP - UUID: 1-10M, TIMESTAMP: 1-N
 Table design: 1 column family with 20 columns of 8 bytes

 Get client:
 Multiple threads
 Each thread have its own tables instance with their Scanner.
 Each thread have its own range of UUIDs and randomly draws beginning of
 time
 range to build rowkey properly (see above).
 Each time Scan requests same amount of rows, but with random rowkey.





 --
 View this message in context:
 http://apache-hbase.679495.n3.nabble.com/HBase

Re: HBase - Performance issue

2014-09-06 Thread kiran
Lars,

We are facing a similar situation on the similar cluster configuration...
We are having high I/O wait percentages on some machines in our cluster...
We have short circuit reads enabled but still we are facing the similar
problem.. the cpu wait goes upto 50% also in some case while issuing scan
commands with multiple threads.. Is there a work around other than applying
the patch for 0.94.4 ??

Thanks
Kiran


On Thu, Apr 25, 2013 at 12:12 AM, lars hofhansl la...@apache.org wrote:

 You may have run into https://issues.apache.org/jira/browse/HBASE-7336
 (which is in 0.94.4)
 (Although I had not observed this effect as much when short circuit reads
 are enabled)



 - Original Message -
 From: kzurek kzu...@proximetry.pl
 To: user@hbase.apache.org
 Cc:
 Sent: Wednesday, April 24, 2013 3:12 AM
 Subject: HBase - Performance issue

 The problem is that when I'm putting my data (multithreaded client, ~30MB/s
 traffic outgoing) into the cluster the load is equally spread over all
 RegionServer with 3.5% average CPU wait time (average CPU user: 51%). When
 I've added similar, mutlithreaded client that Scans for, let say, 100 last
 samples of randomly generated key from chosen time range, I'm getting high
 CPU wait time (20% and up) on two (or more if there is higher number of
 threads, default 10) random RegionServers. Therefore, machines that held
 those RS are getting very hot - one of the consequences is that number of
 store file is constantly increasing, up to the maximum limit. Rest of the
 RS
 are having 10-12% CPU wait time and everything seems to be OK (number of
 store files varies so they are being compacted and not increasing over
 time). Any ideas? Maybe  I could prioritize writes over reads somehow? Is
 it
 possible? If so what would be the best way to that and where it should be
 placed - on the client or cluster side)?

 Cluster specification:
 HBase Version0.94.2-cdh4.2.0
 Hadoop Version2.0.0-cdh4.2.0
 There are 6xDataNodes (5xHDD for storing data), 1xMasterNodes
 Other settings:
 - Bloom filters (ROWCOL) set
 - Short circuit turned on
 - HDFS Block Size: 128MB
 - Java Heap Size of Namenode/Secondary Namenode in Bytes: 8 GiB
 - Java Heap Size of HBase RegionServer in Bytes: 12 GiB
 - Java Heap Size of HBase Master in Bytes: 4 GiB
 - Java Heap Size of DataNode in Bytes: 1 GiB (default)
 Number of regions per RegionServer: 19 (total 114 regions on 6 RS)
 Key design: UUIDTIMESTAMP - UUID: 1-10M, TIMESTAMP: 1-N
 Table design: 1 column family with 20 columns of 8 bytes

 Get client:
 Multiple threads
 Each thread have its own tables instance with their Scanner.
 Each thread have its own range of UUIDs and randomly draws beginning of
 time
 range to build rowkey properly (see above).
 Each time Scan requests same amount of rows, but with random rowkey.





 --
 View this message in context:
 http://apache-hbase.679495.n3.nabble.com/HBase-Performance-issue-tp4042836.html
 Sent from the HBase User mailing list archive at Nabble.com.




-- 
Thank you
Kiran Sarvabhotla

-Even a correct decision is wrong when it is taken late


Re: HBase - Performance issue

2014-09-06 Thread kiran
Also the hbase version is 0.94.1


On Sun, Sep 7, 2014 at 12:00 AM, kiran kiran.sarvabho...@gmail.com wrote:

 Lars,

 We are facing a similar situation on the similar cluster configuration...
 We are having high I/O wait percentages on some machines in our cluster...
 We have short circuit reads enabled but still we are facing the similar
 problem.. the cpu wait goes upto 50% also in some case while issuing scan
 commands with multiple threads.. Is there a work around other than applying
 the patch for 0.94.4 ??

 Thanks
 Kiran


 On Thu, Apr 25, 2013 at 12:12 AM, lars hofhansl la...@apache.org wrote:

 You may have run into https://issues.apache.org/jira/browse/HBASE-7336
 (which is in 0.94.4)
 (Although I had not observed this effect as much when short circuit reads
 are enabled)



 - Original Message -
 From: kzurek kzu...@proximetry.pl
 To: user@hbase.apache.org
 Cc:
 Sent: Wednesday, April 24, 2013 3:12 AM
 Subject: HBase - Performance issue

 The problem is that when I'm putting my data (multithreaded client,
 ~30MB/s
 traffic outgoing) into the cluster the load is equally spread over all
 RegionServer with 3.5% average CPU wait time (average CPU user: 51%). When
 I've added similar, mutlithreaded client that Scans for, let say, 100 last
 samples of randomly generated key from chosen time range, I'm getting high
 CPU wait time (20% and up) on two (or more if there is higher number of
 threads, default 10) random RegionServers. Therefore, machines that held
 those RS are getting very hot - one of the consequences is that number of
 store file is constantly increasing, up to the maximum limit. Rest of the
 RS
 are having 10-12% CPU wait time and everything seems to be OK (number of
 store files varies so they are being compacted and not increasing over
 time). Any ideas? Maybe  I could prioritize writes over reads somehow? Is
 it
 possible? If so what would be the best way to that and where it should be
 placed - on the client or cluster side)?

 Cluster specification:
 HBase Version0.94.2-cdh4.2.0
 Hadoop Version2.0.0-cdh4.2.0
 There are 6xDataNodes (5xHDD for storing data), 1xMasterNodes
 Other settings:
 - Bloom filters (ROWCOL) set
 - Short circuit turned on
 - HDFS Block Size: 128MB
 - Java Heap Size of Namenode/Secondary Namenode in Bytes: 8 GiB
 - Java Heap Size of HBase RegionServer in Bytes: 12 GiB
 - Java Heap Size of HBase Master in Bytes: 4 GiB
 - Java Heap Size of DataNode in Bytes: 1 GiB (default)
 Number of regions per RegionServer: 19 (total 114 regions on 6 RS)
 Key design: UUIDTIMESTAMP - UUID: 1-10M, TIMESTAMP: 1-N
 Table design: 1 column family with 20 columns of 8 bytes

 Get client:
 Multiple threads
 Each thread have its own tables instance with their Scanner.
 Each thread have its own range of UUIDs and randomly draws beginning of
 time
 range to build rowkey properly (see above).
 Each time Scan requests same amount of rows, but with random rowkey.





 --
 View this message in context:
 http://apache-hbase.679495.n3.nabble.com/HBase-Performance-issue-tp4042836.html
 Sent from the HBase User mailing list archive at Nabble.com.




 --
 Thank you
 Kiran Sarvabhotla

 -Even a correct decision is wrong when it is taken late




-- 
Thank you
Kiran Sarvabhotla

-Even a correct decision is wrong when it is taken late


Re: HBase - Performance issue

2014-09-06 Thread Michael Segel
What type of drives. controllers, and network bandwidth do you have? 

Just curious.


On Sep 6, 2014, at 7:37 PM, kiran kiran.sarvabho...@gmail.com wrote:

 Also the hbase version is 0.94.1
 
 
 On Sun, Sep 7, 2014 at 12:00 AM, kiran kiran.sarvabho...@gmail.com wrote:
 
 Lars,
 
 We are facing a similar situation on the similar cluster configuration...
 We are having high I/O wait percentages on some machines in our cluster...
 We have short circuit reads enabled but still we are facing the similar
 problem.. the cpu wait goes upto 50% also in some case while issuing scan
 commands with multiple threads.. Is there a work around other than applying
 the patch for 0.94.4 ??
 
 Thanks
 Kiran
 
 
 On Thu, Apr 25, 2013 at 12:12 AM, lars hofhansl la...@apache.org wrote:
 
 You may have run into https://issues.apache.org/jira/browse/HBASE-7336
 (which is in 0.94.4)
 (Although I had not observed this effect as much when short circuit reads
 are enabled)
 
 
 
 - Original Message -
 From: kzurek kzu...@proximetry.pl
 To: user@hbase.apache.org
 Cc:
 Sent: Wednesday, April 24, 2013 3:12 AM
 Subject: HBase - Performance issue
 
 The problem is that when I'm putting my data (multithreaded client,
 ~30MB/s
 traffic outgoing) into the cluster the load is equally spread over all
 RegionServer with 3.5% average CPU wait time (average CPU user: 51%). When
 I've added similar, mutlithreaded client that Scans for, let say, 100 last
 samples of randomly generated key from chosen time range, I'm getting high
 CPU wait time (20% and up) on two (or more if there is higher number of
 threads, default 10) random RegionServers. Therefore, machines that held
 those RS are getting very hot - one of the consequences is that number of
 store file is constantly increasing, up to the maximum limit. Rest of the
 RS
 are having 10-12% CPU wait time and everything seems to be OK (number of
 store files varies so they are being compacted and not increasing over
 time). Any ideas? Maybe  I could prioritize writes over reads somehow? Is
 it
 possible? If so what would be the best way to that and where it should be
 placed - on the client or cluster side)?
 
 Cluster specification:
 HBase Version0.94.2-cdh4.2.0
 Hadoop Version2.0.0-cdh4.2.0
 There are 6xDataNodes (5xHDD for storing data), 1xMasterNodes
 Other settings:
 - Bloom filters (ROWCOL) set
 - Short circuit turned on
 - HDFS Block Size: 128MB
 - Java Heap Size of Namenode/Secondary Namenode in Bytes: 8 GiB
 - Java Heap Size of HBase RegionServer in Bytes: 12 GiB
 - Java Heap Size of HBase Master in Bytes: 4 GiB
 - Java Heap Size of DataNode in Bytes: 1 GiB (default)
 Number of regions per RegionServer: 19 (total 114 regions on 6 RS)
 Key design: UUIDTIMESTAMP - UUID: 1-10M, TIMESTAMP: 1-N
 Table design: 1 column family with 20 columns of 8 bytes
 
 Get client:
 Multiple threads
 Each thread have its own tables instance with their Scanner.
 Each thread have its own range of UUIDs and randomly draws beginning of
 time
 range to build rowkey properly (see above).
 Each time Scan requests same amount of rows, but with random rowkey.
 
 
 
 
 
 --
 View this message in context:
 http://apache-hbase.679495.n3.nabble.com/HBase-Performance-issue-tp4042836.html
 Sent from the HBase User mailing list archive at Nabble.com.
 
 
 
 
 --
 Thank you
 Kiran Sarvabhotla
 
 -Even a correct decision is wrong when it is taken late
 
 
 
 
 -- 
 Thank you
 Kiran Sarvabhotla
 
 -Even a correct decision is wrong when it is taken late



Re: HBase - Performance issue

2014-09-06 Thread lars hofhansl
Thinking about it again, if you ran into a HBASE-7336 you'd see high CPU load, 
but *not* IOWAIT.
0.94 is at 0.94.23, you should upgrade. A lot of fixes, improvements, and 
performance enhancements went in since 0.94.4.
You can do a rolling upgrade straight to 0.94.23.

With that out of the way, can you post a jstack of the processes that 
experience high wait times?

-- Lars




 From: kiran kiran.sarvabho...@gmail.com
To: user@hbase.apache.org; lars hofhansl la...@apache.org 
Sent: Saturday, September 6, 2014 11:30 AM
Subject: Re: HBase - Performance issue
 


Lars,

We are facing a similar situation on the similar cluster configuration... We 
are having high I/O wait percentages on some machines in our cluster... We have 
short circuit reads enabled but still we are facing the similar problem.. the 
cpu wait goes upto 50% also in some case while issuing scan commands with 
multiple threads.. Is there a work around other than applying the patch for 
0.94.4 ??

Thanks
Kiran



On Thu, Apr 25, 2013 at 12:12 AM, lars hofhansl la...@apache.org wrote:

You may have run into https://issues.apache.org/jira/browse/HBASE-7336 (which 
is in 0.94.4)
(Although I had not observed this effect as much when short circuit reads are 
enabled)




- Original Message -
From: kzurek kzu...@proximetry.pl
To: user@hbase.apache.org
Cc:
Sent: Wednesday, April 24, 2013 3:12 AM
Subject: HBase - Performance issue

The problem is that when I'm putting my data (multithreaded client, ~30MB/s
traffic outgoing) into the cluster the load is equally spread over all
RegionServer with 3.5% average CPU wait time (average CPU user: 51%). When
I've added similar, mutlithreaded client that Scans for, let say, 100 last
samples of randomly generated key from chosen time range, I'm getting high
CPU wait time (20% and up) on two (or more if there is higher number of
threads, default 10) random RegionServers. Therefore, machines that held
those RS are getting very hot - one of the consequences is that number of
store file is constantly increasing, up to the maximum limit. Rest of the RS
are having 10-12% CPU wait time and everything seems to be OK (number of
store files varies so they are being compacted and not increasing over
time). Any ideas? Maybe  I could prioritize writes over reads somehow? Is it
possible? If so what would be the best way to that and where it should be
placed - on the client or cluster side)?

Cluster specification:
HBase Version0.94.2-cdh4.2.0
Hadoop Version2.0.0-cdh4.2.0
There are 6xDataNodes (5xHDD for storing data), 1xMasterNodes
Other settings:
- Bloom filters (ROWCOL) set
- Short circuit turned on
- HDFS Block Size: 128MB
- Java Heap Size of Namenode/Secondary Namenode in Bytes: 8 GiB
- Java Heap Size of HBase RegionServer in Bytes: 12 GiB
- Java Heap Size of HBase Master in Bytes: 4 GiB
- Java Heap Size of DataNode in Bytes: 1 GiB (default)
Number of regions per RegionServer: 19 (total 114 regions on 6 RS)
Key design: UUIDTIMESTAMP - UUID: 1-10M, TIMESTAMP: 1-N
Table design: 1 column family with 20 columns of 8 bytes

Get client:
Multiple threads
Each thread have its own tables instance with their Scanner.
Each thread have its own range of UUIDs and randomly draws beginning of time
range to build rowkey properly (see above).
Each time Scan requests same amount of rows, but with random rowkey.





--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/HBase-Performance-issue-tp4042836.html
Sent from the HBase User mailing list archive at Nabble.com.




-- 

Thank you
Kiran Sarvabhotla

-Even a correct decision is wrong when it is taken late

Re: Hbase Performance Issue

2014-01-09 Thread Akhtar Muhammad Din
I am thankful to all for taking out time and suggesting me the solutions.
Recently, i have implemented a solution using the bulk load. It seems
little faster than the one using the API's, but it still takes far too long
than compared to HDFS. Processing and saving of 10 GB  data on HDFS takes
only 3 mins witn 10 nodes cluster whereas Hbase taking about 30 mins.

Bulk load has its own issue that i need to partition the table before hand,
otherwise it runs only one reducer.  I am working over a platform where I
can't anticipate how much data is going to be loaded in the table and its
difficult to pre-split the table. Is there any way that i can run multiple
reducers without pre-splitting the table?






On Wed, Jan 8, 2014 at 4:53 AM, Suraj Varma svarma...@gmail.com wrote:

 Akhtar:
 There is no manual step for bulk load. You essentially have your script
 that runs the map reduce job that creates the HFiles. On success of this
 script/command, you run the completebulkload command ... the whole bulk
 load can be automated, just like your map reduce job.

 --Suraj


 On Mon, Jan 6, 2014 at 11:42 AM, Mike Axiak m...@axiak.net wrote:

  I suggest you look at hannibal [1] to look at the distribution of the
 data
  on your cluster:
 
  1: https://github.com/sentric/hannibal
 
 
  On Mon, Jan 6, 2014 at 2:14 PM, Doug Meil doug.m...@explorysmedical.com
  wrote:
 
  
   In addition to what everybody else said, look what *where* the regions
  are
   for the target table.  There may be 5 regions (for example), but look
 to
   see if they are all on the same RS.
  
  
  
  
  
   On 1/6/14 5:45 AM, Nicolas Liochon nkey...@gmail.com wrote:
  
   It's very strange that you don't see a perf improvement when you
  increase
   the number of nodes.
   Nothing in what you've done change the performances at the end?
   
   You may want to check:
- the number of regions for this table. Are all the region server
 busy?
   Do
   you have some split on the table?
- How much data you actually write. Is the compression enabled on
 this
   table?
- Do you have compactions? You may want to change the max store file
   settings for unfrequent write load (see
   http://gbif.blogspot.fr/2012/07/optimizing-writes-in-hbase.html).
   
   It would be interesting to test as well the 0.96 release.
   
   
   
   On Sun, Jan 5, 2014 at 2:12 AM, Vladimir Rodionov
   vrodio...@carrieriq.comwrote:
   
   
I think in this case, writing data to HDFS or HFile directly (for
subsequent bulk loading)
is the best option. HBase will never compete in write speed with
 HDFS.
   
Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com
   

From: Ted Yu [yuzhih...@gmail.com]
Sent: Saturday, January 04, 2014 2:33 PM
To: user@hbase.apache.org
Subject: Re: Hbase Performance Issue
   
There're 8 items under:
http://hbase.apache.org/book.html#perf.writing
   
I guess you have through all of them :-)
   
   
On Sat, Jan 4, 2014 at 1:34 PM, Akhtar Muhammad Din
akhtar.m...@gmail.comwrote:
   
 Thanks guys for your precious time.
 Vladimir, as Ted rightly said i want to improve write performance
currently
 (of course i want to read data as fast as possible later on)
 Kevin, my current understanding of bulk load is that you generate
 StoreFiles and later load through a command line program. I dont
  want
   to
do
 any manual step. Our system is getting data after every 15
 minutes,
  so
 requirement is to automate it through client API completely.


   
Confidentiality Notice:  The information contained in this message,
including any attachments hereto, may be confidential and is
 intended
   to be
read only by the individual or entity to whom this message is
   addressed. If
the reader of this message is not the intended recipient or an agent
  or
designee of the intended recipient, please note that any review,
 use,
disclosure or distribution of this message or its attachments, in
 any
   form,
is strictly prohibited.  If you have received this message in error,
   please
immediately notify the sender and/or Notifications@carrieriq.comand
delete or destroy any copy of this message and its attachments.
   
  
  
 




-- 
Regards
Akhtar Muhammad Din


Re: Hbase Performance Issue

2014-01-07 Thread Suraj Varma
Akhtar:
There is no manual step for bulk load. You essentially have your script
that runs the map reduce job that creates the HFiles. On success of this
script/command, you run the completebulkload command ... the whole bulk
load can be automated, just like your map reduce job.

--Suraj


On Mon, Jan 6, 2014 at 11:42 AM, Mike Axiak m...@axiak.net wrote:

 I suggest you look at hannibal [1] to look at the distribution of the data
 on your cluster:

 1: https://github.com/sentric/hannibal


 On Mon, Jan 6, 2014 at 2:14 PM, Doug Meil doug.m...@explorysmedical.com
 wrote:

 
  In addition to what everybody else said, look what *where* the regions
 are
  for the target table.  There may be 5 regions (for example), but look to
  see if they are all on the same RS.
 
 
 
 
 
  On 1/6/14 5:45 AM, Nicolas Liochon nkey...@gmail.com wrote:
 
  It's very strange that you don't see a perf improvement when you
 increase
  the number of nodes.
  Nothing in what you've done change the performances at the end?
  
  You may want to check:
   - the number of regions for this table. Are all the region server busy?
  Do
  you have some split on the table?
   - How much data you actually write. Is the compression enabled on this
  table?
   - Do you have compactions? You may want to change the max store file
  settings for unfrequent write load (see
  http://gbif.blogspot.fr/2012/07/optimizing-writes-in-hbase.html).
  
  It would be interesting to test as well the 0.96 release.
  
  
  
  On Sun, Jan 5, 2014 at 2:12 AM, Vladimir Rodionov
  vrodio...@carrieriq.comwrote:
  
  
   I think in this case, writing data to HDFS or HFile directly (for
   subsequent bulk loading)
   is the best option. HBase will never compete in write speed with HDFS.
  
   Best regards,
   Vladimir Rodionov
   Principal Platform Engineer
   Carrier IQ, www.carrieriq.com
   e-mail: vrodio...@carrieriq.com
  
   
   From: Ted Yu [yuzhih...@gmail.com]
   Sent: Saturday, January 04, 2014 2:33 PM
   To: user@hbase.apache.org
   Subject: Re: Hbase Performance Issue
  
   There're 8 items under:
   http://hbase.apache.org/book.html#perf.writing
  
   I guess you have through all of them :-)
  
  
   On Sat, Jan 4, 2014 at 1:34 PM, Akhtar Muhammad Din
   akhtar.m...@gmail.comwrote:
  
Thanks guys for your precious time.
Vladimir, as Ted rightly said i want to improve write performance
   currently
(of course i want to read data as fast as possible later on)
Kevin, my current understanding of bulk load is that you generate
StoreFiles and later load through a command line program. I dont
 want
  to
   do
any manual step. Our system is getting data after every 15 minutes,
 so
requirement is to automate it through client API completely.
   
   
  
   Confidentiality Notice:  The information contained in this message,
   including any attachments hereto, may be confidential and is intended
  to be
   read only by the individual or entity to whom this message is
  addressed. If
   the reader of this message is not the intended recipient or an agent
 or
   designee of the intended recipient, please note that any review, use,
   disclosure or distribution of this message or its attachments, in any
  form,
   is strictly prohibited.  If you have received this message in error,
  please
   immediately notify the sender and/or notificati...@carrieriq.com and
   delete or destroy any copy of this message and its attachments.
  
 
 



Re: Hbase Performance Issue

2014-01-06 Thread Nicolas Liochon
It's very strange that you don't see a perf improvement when you increase
the number of nodes.
Nothing in what you've done change the performances at the end?

You may want to check:
 - the number of regions for this table. Are all the region server busy? Do
you have some split on the table?
 - How much data you actually write. Is the compression enabled on this
table?
 - Do you have compactions? You may want to change the max store file
settings for unfrequent write load (see
http://gbif.blogspot.fr/2012/07/optimizing-writes-in-hbase.html).

It would be interesting to test as well the 0.96 release.



On Sun, Jan 5, 2014 at 2:12 AM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:


 I think in this case, writing data to HDFS or HFile directly (for
 subsequent bulk loading)
 is the best option. HBase will never compete in write speed with HDFS.

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Ted Yu [yuzhih...@gmail.com]
 Sent: Saturday, January 04, 2014 2:33 PM
 To: user@hbase.apache.org
 Subject: Re: Hbase Performance Issue

 There're 8 items under:
 http://hbase.apache.org/book.html#perf.writing

 I guess you have through all of them :-)


 On Sat, Jan 4, 2014 at 1:34 PM, Akhtar Muhammad Din
 akhtar.m...@gmail.comwrote:

  Thanks guys for your precious time.
  Vladimir, as Ted rightly said i want to improve write performance
 currently
  (of course i want to read data as fast as possible later on)
  Kevin, my current understanding of bulk load is that you generate
  StoreFiles and later load through a command line program. I dont want to
 do
  any manual step. Our system is getting data after every 15 minutes, so
  requirement is to automate it through client API completely.
 
 

 Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended to be
 read only by the individual or entity to whom this message is addressed. If
 the reader of this message is not the intended recipient or an agent or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any form,
 is strictly prohibited.  If you have received this message in error, please
 immediately notify the sender and/or notificati...@carrieriq.com and
 delete or destroy any copy of this message and its attachments.



Re: Hbase Performance Issue

2014-01-06 Thread Doug Meil

In addition to what everybody else said, look what *where* the regions are
for the target table.  There may be 5 regions (for example), but look to
see if they are all on the same RS.





On 1/6/14 5:45 AM, Nicolas Liochon nkey...@gmail.com wrote:

It's very strange that you don't see a perf improvement when you increase
the number of nodes.
Nothing in what you've done change the performances at the end?

You may want to check:
 - the number of regions for this table. Are all the region server busy?
Do
you have some split on the table?
 - How much data you actually write. Is the compression enabled on this
table?
 - Do you have compactions? You may want to change the max store file
settings for unfrequent write load (see
http://gbif.blogspot.fr/2012/07/optimizing-writes-in-hbase.html).

It would be interesting to test as well the 0.96 release.



On Sun, Jan 5, 2014 at 2:12 AM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:


 I think in this case, writing data to HDFS or HFile directly (for
 subsequent bulk loading)
 is the best option. HBase will never compete in write speed with HDFS.

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Ted Yu [yuzhih...@gmail.com]
 Sent: Saturday, January 04, 2014 2:33 PM
 To: user@hbase.apache.org
 Subject: Re: Hbase Performance Issue

 There're 8 items under:
 http://hbase.apache.org/book.html#perf.writing

 I guess you have through all of them :-)


 On Sat, Jan 4, 2014 at 1:34 PM, Akhtar Muhammad Din
 akhtar.m...@gmail.comwrote:

  Thanks guys for your precious time.
  Vladimir, as Ted rightly said i want to improve write performance
 currently
  (of course i want to read data as fast as possible later on)
  Kevin, my current understanding of bulk load is that you generate
  StoreFiles and later load through a command line program. I dont want
to
 do
  any manual step. Our system is getting data after every 15 minutes, so
  requirement is to automate it through client API completely.
 
 

 Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended
to be
 read only by the individual or entity to whom this message is
addressed. If
 the reader of this message is not the intended recipient or an agent or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any
form,
 is strictly prohibited.  If you have received this message in error,
please
 immediately notify the sender and/or notificati...@carrieriq.com and
 delete or destroy any copy of this message and its attachments.




Re: Hbase Performance Issue

2014-01-06 Thread Mike Axiak
I suggest you look at hannibal [1] to look at the distribution of the data
on your cluster:

1: https://github.com/sentric/hannibal


On Mon, Jan 6, 2014 at 2:14 PM, Doug Meil doug.m...@explorysmedical.comwrote:


 In addition to what everybody else said, look what *where* the regions are
 for the target table.  There may be 5 regions (for example), but look to
 see if they are all on the same RS.





 On 1/6/14 5:45 AM, Nicolas Liochon nkey...@gmail.com wrote:

 It's very strange that you don't see a perf improvement when you increase
 the number of nodes.
 Nothing in what you've done change the performances at the end?
 
 You may want to check:
  - the number of regions for this table. Are all the region server busy?
 Do
 you have some split on the table?
  - How much data you actually write. Is the compression enabled on this
 table?
  - Do you have compactions? You may want to change the max store file
 settings for unfrequent write load (see
 http://gbif.blogspot.fr/2012/07/optimizing-writes-in-hbase.html).
 
 It would be interesting to test as well the 0.96 release.
 
 
 
 On Sun, Jan 5, 2014 at 2:12 AM, Vladimir Rodionov
 vrodio...@carrieriq.comwrote:
 
 
  I think in this case, writing data to HDFS or HFile directly (for
  subsequent bulk loading)
  is the best option. HBase will never compete in write speed with HDFS.
 
  Best regards,
  Vladimir Rodionov
  Principal Platform Engineer
  Carrier IQ, www.carrieriq.com
  e-mail: vrodio...@carrieriq.com
 
  
  From: Ted Yu [yuzhih...@gmail.com]
  Sent: Saturday, January 04, 2014 2:33 PM
  To: user@hbase.apache.org
  Subject: Re: Hbase Performance Issue
 
  There're 8 items under:
  http://hbase.apache.org/book.html#perf.writing
 
  I guess you have through all of them :-)
 
 
  On Sat, Jan 4, 2014 at 1:34 PM, Akhtar Muhammad Din
  akhtar.m...@gmail.comwrote:
 
   Thanks guys for your precious time.
   Vladimir, as Ted rightly said i want to improve write performance
  currently
   (of course i want to read data as fast as possible later on)
   Kevin, my current understanding of bulk load is that you generate
   StoreFiles and later load through a command line program. I dont want
 to
  do
   any manual step. Our system is getting data after every 15 minutes, so
   requirement is to automate it through client API completely.
  
  
 
  Confidentiality Notice:  The information contained in this message,
  including any attachments hereto, may be confidential and is intended
 to be
  read only by the individual or entity to whom this message is
 addressed. If
  the reader of this message is not the intended recipient or an agent or
  designee of the intended recipient, please note that any review, use,
  disclosure or distribution of this message or its attachments, in any
 form,
  is strictly prohibited.  If you have received this message in error,
 please
  immediately notify the sender and/or notificati...@carrieriq.com and
  delete or destroy any copy of this message and its attachments.
 




Hbase Performance Issue

2014-01-04 Thread Akhtar Muhammad Din
Hi,
I have been running a map reduce job that joins 2 datasets of 1.3 and 4 GB
in size. Joining is done at reduce side. Output is written to either Hbase
or HDFS depending upon configuration. The problem I am having is that Hbase
takes about 60-80 minutes to write the processed data, on the other hand
HDFS takes only 3-5 mins to write the same data. I really want to improve
the Hbase speed and bring it down to 1-2 min.

I am using amazon EC2 instances, launched a cluster of size 3 and later 10,
have tried both c3.4xlarge and c3.8xlarge instances.

I can see significant increase in performance while writing to HDFS as i
use cluster with more nodes, having high specifications, but in the case of
Hbase there was no significant change in performance.

I have been going through different posts, articles and have read Hbase
book to solve the Hbase performance issue but have not been able to succeed
so far.
Here are the few things i have tried out so far:

*Client Side*
- Turned off writing to WAL
- Experimented with write buffer size
- Turned off auto flush on table
- Used cache, experimented with different sizes


*Hbase Server Side*
- Increased region servers heap size to 8 GB
- Experimented with handlers count
- Increased Memstore flush size to 512 MB
- Experimented with hbase.hregion.max.filesize, tried different sizes

There are many other parameters i have tried out following the suggestions
from  different sources, but nothing worked so far.

Your help will be really appreciated.

-- 
Regards
Akhtar Muhammad Din


Re: Hbase Performance Issue

2014-01-04 Thread Akhtar Muhammad Din
im  using CDH 4.5:
Hadoop:  2.0.0-cdh4.5.0
HBase:   0.94.6-cdh4.5.0

Regards


On Sun, Jan 5, 2014 at 1:24 AM, Ted Yu yuzhih...@gmail.com wrote:

 What version of HBase / hdfs are you running with ?

 Cheers



 On Sat, Jan 4, 2014 at 12:17 PM, Akhtar Muhammad Din
 akhtar.m...@gmail.comwrote:

  Hi,
  I have been running a map reduce job that joins 2 datasets of 1.3 and 4
 GB
  in size. Joining is done at reduce side. Output is written to either
 Hbase
  or HDFS depending upon configuration. The problem I am having is that
 Hbase
  takes about 60-80 minutes to write the processed data, on the other hand
  HDFS takes only 3-5 mins to write the same data. I really want to improve
  the Hbase speed and bring it down to 1-2 min.
 
  I am using amazon EC2 instances, launched a cluster of size 3 and later
 10,
  have tried both c3.4xlarge and c3.8xlarge instances.
 
  I can see significant increase in performance while writing to HDFS as i
  use cluster with more nodes, having high specifications, but in the case
 of
  Hbase there was no significant change in performance.
 
  I have been going through different posts, articles and have read Hbase
  book to solve the Hbase performance issue but have not been able to
 succeed
  so far.
  Here are the few things i have tried out so far:
 
  *Client Side*
  - Turned off writing to WAL
  - Experimented with write buffer size
  - Turned off auto flush on table
  - Used cache, experimented with different sizes
 
 
  *Hbase Server Side*
  - Increased region servers heap size to 8 GB
  - Experimented with handlers count
  - Increased Memstore flush size to 512 MB
  - Experimented with hbase.hregion.max.filesize, tried different sizes
 
  There are many other parameters i have tried out following the
 suggestions
  from  different sources, but nothing worked so far.
 
  Your help will be really appreciated.
 
  --
  Regards
  Akhtar Muhammad Din
 




-- 
Regards
Akhtar Muhammad Din


RE: Hbase Performance Issue

2014-01-04 Thread Vladimir Rodionov
You cay try MapReduce over snapshot files
https://issues.apache.org/jira/browse/HBASE-8369

but you will need to patch 0.94.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Akhtar Muhammad Din [akhtar.m...@gmail.com]
Sent: Saturday, January 04, 2014 12:44 PM
To: user@hbase.apache.org
Subject: Re: Hbase Performance Issue

im  using CDH 4.5:
Hadoop:  2.0.0-cdh4.5.0
HBase:   0.94.6-cdh4.5.0

Regards


On Sun, Jan 5, 2014 at 1:24 AM, Ted Yu yuzhih...@gmail.com wrote:

 What version of HBase / hdfs are you running with ?

 Cheers



 On Sat, Jan 4, 2014 at 12:17 PM, Akhtar Muhammad Din
 akhtar.m...@gmail.comwrote:

  Hi,
  I have been running a map reduce job that joins 2 datasets of 1.3 and 4
 GB
  in size. Joining is done at reduce side. Output is written to either
 Hbase
  or HDFS depending upon configuration. The problem I am having is that
 Hbase
  takes about 60-80 minutes to write the processed data, on the other hand
  HDFS takes only 3-5 mins to write the same data. I really want to improve
  the Hbase speed and bring it down to 1-2 min.
 
  I am using amazon EC2 instances, launched a cluster of size 3 and later
 10,
  have tried both c3.4xlarge and c3.8xlarge instances.
 
  I can see significant increase in performance while writing to HDFS as i
  use cluster with more nodes, having high specifications, but in the case
 of
  Hbase there was no significant change in performance.
 
  I have been going through different posts, articles and have read Hbase
  book to solve the Hbase performance issue but have not been able to
 succeed
  so far.
  Here are the few things i have tried out so far:
 
  *Client Side*
  - Turned off writing to WAL
  - Experimented with write buffer size
  - Turned off auto flush on table
  - Used cache, experimented with different sizes
 
 
  *Hbase Server Side*
  - Increased region servers heap size to 8 GB
  - Experimented with handlers count
  - Increased Memstore flush size to 512 MB
  - Experimented with hbase.hregion.max.filesize, tried different sizes
 
  There are many other parameters i have tried out following the
 suggestions
  from  different sources, but nothing worked so far.
 
  Your help will be really appreciated.
 
  --
  Regards
  Akhtar Muhammad Din
 




--
Regards
Akhtar Muhammad Din

Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or notificati...@carrieriq.com and delete or destroy any 
copy of this message and its attachments.


Re: Hbase Performance Issue

2014-01-04 Thread Ted Yu
bq. Output is written to either Hbase

Looks like Akhtar wants to boost write performance to HBase.
MapReduce over snapshot files targets higher read throughput.

Cheers


On Sat, Jan 4, 2014 at 12:55 PM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:

 You cay try MapReduce over snapshot files
 https://issues.apache.org/jira/browse/HBASE-8369

 but you will need to patch 0.94.

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Akhtar Muhammad Din [akhtar.m...@gmail.com]
 Sent: Saturday, January 04, 2014 12:44 PM
 To: user@hbase.apache.org
 Subject: Re: Hbase Performance Issue

 im  using CDH 4.5:
 Hadoop:  2.0.0-cdh4.5.0
 HBase:   0.94.6-cdh4.5.0

 Regards


 On Sun, Jan 5, 2014 at 1:24 AM, Ted Yu yuzhih...@gmail.com wrote:

  What version of HBase / hdfs are you running with ?
 
  Cheers
 
 
 
  On Sat, Jan 4, 2014 at 12:17 PM, Akhtar Muhammad Din
  akhtar.m...@gmail.comwrote:
 
   Hi,
   I have been running a map reduce job that joins 2 datasets of 1.3 and 4
  GB
   in size. Joining is done at reduce side. Output is written to either
  Hbase
   or HDFS depending upon configuration. The problem I am having is that
  Hbase
   takes about 60-80 minutes to write the processed data, on the other
 hand
   HDFS takes only 3-5 mins to write the same data. I really want to
 improve
   the Hbase speed and bring it down to 1-2 min.
  
   I am using amazon EC2 instances, launched a cluster of size 3 and later
  10,
   have tried both c3.4xlarge and c3.8xlarge instances.
  
   I can see significant increase in performance while writing to HDFS as
 i
   use cluster with more nodes, having high specifications, but in the
 case
  of
   Hbase there was no significant change in performance.
  
   I have been going through different posts, articles and have read Hbase
   book to solve the Hbase performance issue but have not been able to
  succeed
   so far.
   Here are the few things i have tried out so far:
  
   *Client Side*
   - Turned off writing to WAL
   - Experimented with write buffer size
   - Turned off auto flush on table
   - Used cache, experimented with different sizes
  
  
   *Hbase Server Side*
   - Increased region servers heap size to 8 GB
   - Experimented with handlers count
   - Increased Memstore flush size to 512 MB
   - Experimented with hbase.hregion.max.filesize, tried different sizes
  
   There are many other parameters i have tried out following the
  suggestions
   from  different sources, but nothing worked so far.
  
   Your help will be really appreciated.
  
   --
   Regards
   Akhtar Muhammad Din
  
 



 --
 Regards
 Akhtar Muhammad Din

 Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended to be
 read only by the individual or entity to whom this message is addressed. If
 the reader of this message is not the intended recipient or an agent or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any form,
 is strictly prohibited.  If you have received this message in error, please
 immediately notify the sender and/or notificati...@carrieriq.com and
 delete or destroy any copy of this message and its attachments.



Re: Hbase Performance Issue

2014-01-04 Thread Kevin O'dell
Have you tried writing out an hfile and then bulk loading the data?
On Jan 4, 2014 4:01 PM, Ted Yu yuzhih...@gmail.com wrote:

 bq. Output is written to either Hbase

 Looks like Akhtar wants to boost write performance to HBase.
 MapReduce over snapshot files targets higher read throughput.

 Cheers


 On Sat, Jan 4, 2014 at 12:55 PM, Vladimir Rodionov
 vrodio...@carrieriq.comwrote:

  You cay try MapReduce over snapshot files
  https://issues.apache.org/jira/browse/HBASE-8369
 
  but you will need to patch 0.94.
 
  Best regards,
  Vladimir Rodionov
  Principal Platform Engineer
  Carrier IQ, www.carrieriq.com
  e-mail: vrodio...@carrieriq.com
 
  
  From: Akhtar Muhammad Din [akhtar.m...@gmail.com]
  Sent: Saturday, January 04, 2014 12:44 PM
  To: user@hbase.apache.org
  Subject: Re: Hbase Performance Issue
 
  im  using CDH 4.5:
  Hadoop:  2.0.0-cdh4.5.0
  HBase:   0.94.6-cdh4.5.0
 
  Regards
 
 
  On Sun, Jan 5, 2014 at 1:24 AM, Ted Yu yuzhih...@gmail.com wrote:
 
   What version of HBase / hdfs are you running with ?
  
   Cheers
  
  
  
   On Sat, Jan 4, 2014 at 12:17 PM, Akhtar Muhammad Din
   akhtar.m...@gmail.comwrote:
  
Hi,
I have been running a map reduce job that joins 2 datasets of 1.3
 and 4
   GB
in size. Joining is done at reduce side. Output is written to either
   Hbase
or HDFS depending upon configuration. The problem I am having is that
   Hbase
takes about 60-80 minutes to write the processed data, on the other
  hand
HDFS takes only 3-5 mins to write the same data. I really want to
  improve
the Hbase speed and bring it down to 1-2 min.
   
I am using amazon EC2 instances, launched a cluster of size 3 and
 later
   10,
have tried both c3.4xlarge and c3.8xlarge instances.
   
I can see significant increase in performance while writing to HDFS
 as
  i
use cluster with more nodes, having high specifications, but in the
  case
   of
Hbase there was no significant change in performance.
   
I have been going through different posts, articles and have read
 Hbase
book to solve the Hbase performance issue but have not been able to
   succeed
so far.
Here are the few things i have tried out so far:
   
*Client Side*
- Turned off writing to WAL
- Experimented with write buffer size
- Turned off auto flush on table
- Used cache, experimented with different sizes
   
   
*Hbase Server Side*
- Increased region servers heap size to 8 GB
- Experimented with handlers count
- Increased Memstore flush size to 512 MB
- Experimented with hbase.hregion.max.filesize, tried different sizes
   
There are many other parameters i have tried out following the
   suggestions
from  different sources, but nothing worked so far.
   
Your help will be really appreciated.
   
--
Regards
Akhtar Muhammad Din
   
  
 
 
 
  --
  Regards
  Akhtar Muhammad Din
 
  Confidentiality Notice:  The information contained in this message,
  including any attachments hereto, may be confidential and is intended to
 be
  read only by the individual or entity to whom this message is addressed.
 If
  the reader of this message is not the intended recipient or an agent or
  designee of the intended recipient, please note that any review, use,
  disclosure or distribution of this message or its attachments, in any
 form,
  is strictly prohibited.  If you have received this message in error,
 please
  immediately notify the sender and/or notificati...@carrieriq.com and
  delete or destroy any copy of this message and its attachments.
 



Re: Hbase Performance Issue

2014-01-04 Thread Akhtar Muhammad Din
Thanks guys for your precious time.
Vladimir, as Ted rightly said i want to improve write performance currently
(of course i want to read data as fast as possible later on)
Kevin, my current understanding of bulk load is that you generate
StoreFiles and later load through a command line program. I dont want to do
any manual step. Our system is getting data after every 15 minutes, so
requirement is to automate it through client API completely.



On Sun, Jan 5, 2014 at 2:19 AM, Kevin O'dell kevin.od...@cloudera.comwrote:

 Have you tried writing out an hfile and then bulk loading the data?
 On Jan 4, 2014 4:01 PM, Ted Yu yuzhih...@gmail.com wrote:

  bq. Output is written to either Hbase
 
  Looks like Akhtar wants to boost write performance to HBase.
  MapReduce over snapshot files targets higher read throughput.
 
  Cheers
 
 
  On Sat, Jan 4, 2014 at 12:55 PM, Vladimir Rodionov
  vrodio...@carrieriq.comwrote:
 
   You cay try MapReduce over snapshot files
   https://issues.apache.org/jira/browse/HBASE-8369
  
   but you will need to patch 0.94.
  
   Best regards,
   Vladimir Rodionov
   Principal Platform Engineer
   Carrier IQ, www.carrieriq.com
   e-mail: vrodio...@carrieriq.com
  
   
   From: Akhtar Muhammad Din [akhtar.m...@gmail.com]
   Sent: Saturday, January 04, 2014 12:44 PM
   To: user@hbase.apache.org
   Subject: Re: Hbase Performance Issue
  
   im  using CDH 4.5:
   Hadoop:  2.0.0-cdh4.5.0
   HBase:   0.94.6-cdh4.5.0
  
   Regards
  
  
   On Sun, Jan 5, 2014 at 1:24 AM, Ted Yu yuzhih...@gmail.com wrote:
  
What version of HBase / hdfs are you running with ?
   
Cheers
   
   
   
On Sat, Jan 4, 2014 at 12:17 PM, Akhtar Muhammad Din
akhtar.m...@gmail.comwrote:
   
 Hi,
 I have been running a map reduce job that joins 2 datasets of 1.3
  and 4
GB
 in size. Joining is done at reduce side. Output is written to
 either
Hbase
 or HDFS depending upon configuration. The problem I am having is
 that
Hbase
 takes about 60-80 minutes to write the processed data, on the other
   hand
 HDFS takes only 3-5 mins to write the same data. I really want to
   improve
 the Hbase speed and bring it down to 1-2 min.

 I am using amazon EC2 instances, launched a cluster of size 3 and
  later
10,
 have tried both c3.4xlarge and c3.8xlarge instances.

 I can see significant increase in performance while writing to HDFS
  as
   i
 use cluster with more nodes, having high specifications, but in the
   case
of
 Hbase there was no significant change in performance.

 I have been going through different posts, articles and have read
  Hbase
 book to solve the Hbase performance issue but have not been able to
succeed
 so far.
 Here are the few things i have tried out so far:

 *Client Side*
 - Turned off writing to WAL
 - Experimented with write buffer size
 - Turned off auto flush on table
 - Used cache, experimented with different sizes


 *Hbase Server Side*
 - Increased region servers heap size to 8 GB
 - Experimented with handlers count
 - Increased Memstore flush size to 512 MB
 - Experimented with hbase.hregion.max.filesize, tried different
 sizes

 There are many other parameters i have tried out following the
suggestions
 from  different sources, but nothing worked so far.

 Your help will be really appreciated.

 --
 Regards
 Akhtar Muhammad Din

   
  
  
  
   --
   Regards
   Akhtar Muhammad Din
  
   Confidentiality Notice:  The information contained in this message,
   including any attachments hereto, may be confidential and is intended
 to
  be
   read only by the individual or entity to whom this message is
 addressed.
  If
   the reader of this message is not the intended recipient or an agent or
   designee of the intended recipient, please note that any review, use,
   disclosure or distribution of this message or its attachments, in any
  form,
   is strictly prohibited.  If you have received this message in error,
  please
   immediately notify the sender and/or notificati...@carrieriq.com and
   delete or destroy any copy of this message and its attachments.
  
 




-- 
Regards
Akhtar Muhammad Din


Re: Hbase Performance Issue

2014-01-04 Thread Kevin O'dell
Could you give us a region server log to look at during a job?
On Jan 4, 2014 4:35 PM, Akhtar Muhammad Din akhtar.m...@gmail.com wrote:

 Thanks guys for your precious time.
 Vladimir, as Ted rightly said i want to improve write performance currently
 (of course i want to read data as fast as possible later on)
 Kevin, my current understanding of bulk load is that you generate
 StoreFiles and later load through a command line program. I dont want to do
 any manual step. Our system is getting data after every 15 minutes, so
 requirement is to automate it through client API completely.



 On Sun, Jan 5, 2014 at 2:19 AM, Kevin O'dell kevin.od...@cloudera.com
 wrote:

  Have you tried writing out an hfile and then bulk loading the data?
  On Jan 4, 2014 4:01 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   bq. Output is written to either Hbase
  
   Looks like Akhtar wants to boost write performance to HBase.
   MapReduce over snapshot files targets higher read throughput.
  
   Cheers
  
  
   On Sat, Jan 4, 2014 at 12:55 PM, Vladimir Rodionov
   vrodio...@carrieriq.comwrote:
  
You cay try MapReduce over snapshot files
https://issues.apache.org/jira/browse/HBASE-8369
   
but you will need to patch 0.94.
   
Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com
   

From: Akhtar Muhammad Din [akhtar.m...@gmail.com]
Sent: Saturday, January 04, 2014 12:44 PM
To: user@hbase.apache.org
Subject: Re: Hbase Performance Issue
   
im  using CDH 4.5:
Hadoop:  2.0.0-cdh4.5.0
HBase:   0.94.6-cdh4.5.0
   
Regards
   
   
On Sun, Jan 5, 2014 at 1:24 AM, Ted Yu yuzhih...@gmail.com wrote:
   
 What version of HBase / hdfs are you running with ?

 Cheers



 On Sat, Jan 4, 2014 at 12:17 PM, Akhtar Muhammad Din
 akhtar.m...@gmail.comwrote:

  Hi,
  I have been running a map reduce job that joins 2 datasets of 1.3
   and 4
 GB
  in size. Joining is done at reduce side. Output is written to
  either
 Hbase
  or HDFS depending upon configuration. The problem I am having is
  that
 Hbase
  takes about 60-80 minutes to write the processed data, on the
 other
hand
  HDFS takes only 3-5 mins to write the same data. I really want to
improve
  the Hbase speed and bring it down to 1-2 min.
 
  I am using amazon EC2 instances, launched a cluster of size 3 and
   later
 10,
  have tried both c3.4xlarge and c3.8xlarge instances.
 
  I can see significant increase in performance while writing to
 HDFS
   as
i
  use cluster with more nodes, having high specifications, but in
 the
case
 of
  Hbase there was no significant change in performance.
 
  I have been going through different posts, articles and have read
   Hbase
  book to solve the Hbase performance issue but have not been able
 to
 succeed
  so far.
  Here are the few things i have tried out so far:
 
  *Client Side*
  - Turned off writing to WAL
  - Experimented with write buffer size
  - Turned off auto flush on table
  - Used cache, experimented with different sizes
 
 
  *Hbase Server Side*
  - Increased region servers heap size to 8 GB
  - Experimented with handlers count
  - Increased Memstore flush size to 512 MB
  - Experimented with hbase.hregion.max.filesize, tried different
  sizes
 
  There are many other parameters i have tried out following the
 suggestions
  from  different sources, but nothing worked so far.
 
  Your help will be really appreciated.
 
  --
  Regards
  Akhtar Muhammad Din
 

   
   
   
--
Regards
Akhtar Muhammad Din
   
Confidentiality Notice:  The information contained in this message,
including any attachments hereto, may be confidential and is intended
  to
   be
read only by the individual or entity to whom this message is
  addressed.
   If
the reader of this message is not the intended recipient or an agent
 or
designee of the intended recipient, please note that any review, use,
disclosure or distribution of this message or its attachments, in any
   form,
is strictly prohibited.  If you have received this message in error,
   please
immediately notify the sender and/or notificati...@carrieriq.com and
delete or destroy any copy of this message and its attachments.
   
  
 



 --
 Regards
 Akhtar Muhammad Din



Re: Hbase Performance Issue

2014-01-04 Thread Ted Yu
There're 8 items under:
http://hbase.apache.org/book.html#perf.writing

I guess you have through all of them :-)


On Sat, Jan 4, 2014 at 1:34 PM, Akhtar Muhammad Din
akhtar.m...@gmail.comwrote:

 Thanks guys for your precious time.
 Vladimir, as Ted rightly said i want to improve write performance currently
 (of course i want to read data as fast as possible later on)
 Kevin, my current understanding of bulk load is that you generate
 StoreFiles and later load through a command line program. I dont want to do
 any manual step. Our system is getting data after every 15 minutes, so
 requirement is to automate it through client API completely.



 On Sun, Jan 5, 2014 at 2:19 AM, Kevin O'dell kevin.od...@cloudera.com
 wrote:

  Have you tried writing out an hfile and then bulk loading the data?
  On Jan 4, 2014 4:01 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   bq. Output is written to either Hbase
  
   Looks like Akhtar wants to boost write performance to HBase.
   MapReduce over snapshot files targets higher read throughput.
  
   Cheers
  
  
   On Sat, Jan 4, 2014 at 12:55 PM, Vladimir Rodionov
   vrodio...@carrieriq.comwrote:
  
You cay try MapReduce over snapshot files
https://issues.apache.org/jira/browse/HBASE-8369
   
but you will need to patch 0.94.
   
Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com
   

From: Akhtar Muhammad Din [akhtar.m...@gmail.com]
Sent: Saturday, January 04, 2014 12:44 PM
To: user@hbase.apache.org
Subject: Re: Hbase Performance Issue
   
im  using CDH 4.5:
Hadoop:  2.0.0-cdh4.5.0
HBase:   0.94.6-cdh4.5.0
   
Regards
   
   
On Sun, Jan 5, 2014 at 1:24 AM, Ted Yu yuzhih...@gmail.com wrote:
   
 What version of HBase / hdfs are you running with ?

 Cheers



 On Sat, Jan 4, 2014 at 12:17 PM, Akhtar Muhammad Din
 akhtar.m...@gmail.comwrote:

  Hi,
  I have been running a map reduce job that joins 2 datasets of 1.3
   and 4
 GB
  in size. Joining is done at reduce side. Output is written to
  either
 Hbase
  or HDFS depending upon configuration. The problem I am having is
  that
 Hbase
  takes about 60-80 minutes to write the processed data, on the
 other
hand
  HDFS takes only 3-5 mins to write the same data. I really want to
improve
  the Hbase speed and bring it down to 1-2 min.
 
  I am using amazon EC2 instances, launched a cluster of size 3 and
   later
 10,
  have tried both c3.4xlarge and c3.8xlarge instances.
 
  I can see significant increase in performance while writing to
 HDFS
   as
i
  use cluster with more nodes, having high specifications, but in
 the
case
 of
  Hbase there was no significant change in performance.
 
  I have been going through different posts, articles and have read
   Hbase
  book to solve the Hbase performance issue but have not been able
 to
 succeed
  so far.
  Here are the few things i have tried out so far:
 
  *Client Side*
  - Turned off writing to WAL
  - Experimented with write buffer size
  - Turned off auto flush on table
  - Used cache, experimented with different sizes
 
 
  *Hbase Server Side*
  - Increased region servers heap size to 8 GB
  - Experimented with handlers count
  - Increased Memstore flush size to 512 MB
  - Experimented with hbase.hregion.max.filesize, tried different
  sizes
 
  There are many other parameters i have tried out following the
 suggestions
  from  different sources, but nothing worked so far.
 
  Your help will be really appreciated.
 
  --
  Regards
  Akhtar Muhammad Din
 

   
   
   
--
Regards
Akhtar Muhammad Din
   
Confidentiality Notice:  The information contained in this message,
including any attachments hereto, may be confidential and is intended
  to
   be
read only by the individual or entity to whom this message is
  addressed.
   If
the reader of this message is not the intended recipient or an agent
 or
designee of the intended recipient, please note that any review, use,
disclosure or distribution of this message or its attachments, in any
   form,
is strictly prohibited.  If you have received this message in error,
   please
immediately notify the sender and/or notificati...@carrieriq.com and
delete or destroy any copy of this message and its attachments.
   
  
 



 --
 Regards
 Akhtar Muhammad Din



RE: Hbase Performance Issue

2014-01-04 Thread Vladimir Rodionov

I think in this case, writing data to HDFS or HFile directly (for subsequent 
bulk loading)
is the best option. HBase will never compete in write speed with HDFS.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Ted Yu [yuzhih...@gmail.com]
Sent: Saturday, January 04, 2014 2:33 PM
To: user@hbase.apache.org
Subject: Re: Hbase Performance Issue

There're 8 items under:
http://hbase.apache.org/book.html#perf.writing

I guess you have through all of them :-)


On Sat, Jan 4, 2014 at 1:34 PM, Akhtar Muhammad Din
akhtar.m...@gmail.comwrote:

 Thanks guys for your precious time.
 Vladimir, as Ted rightly said i want to improve write performance currently
 (of course i want to read data as fast as possible later on)
 Kevin, my current understanding of bulk load is that you generate
 StoreFiles and later load through a command line program. I dont want to do
 any manual step. Our system is getting data after every 15 minutes, so
 requirement is to automate it through client API completely.



Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or notificati...@carrieriq.com and delete or destroy any 
copy of this message and its attachments.


HBase - Performance issue

2013-04-24 Thread kzurek
The problem is that when I'm putting my data (multithreaded client, ~30MB/s
traffic outgoing) into the cluster the load is equally spread over all
RegionServer with 3.5% average CPU wait time (average CPU user: 51%). When
I've added similar, mutlithreaded client that Scans for, let say, 100 last
samples of randomly generated key from chosen time range, I'm getting high
CPU wait time (20% and up) on two (or more if there is higher number of
threads, default 10) random RegionServers. Therefore, machines that held
those RS are getting very hot - one of the consequences is that number of
store file is constantly increasing, up to the maximum limit. Rest of the RS
are having 10-12% CPU wait time and everything seems to be OK (number of
store files varies so they are being compacted and not increasing over
time). Any ideas? Maybe  I could prioritize writes over reads somehow? Is it
possible? If so what would be the best way to that and where it should be
placed - on the client or cluster side)? 

Cluster specification:
HBase Version   0.94.2-cdh4.2.0
Hadoop Version  2.0.0-cdh4.2.0
There are 6xDataNodes (5xHDD for storing data), 1xMasterNodes
Other settings:
 - Bloom filters (ROWCOL) set
 - Short circuit turned on
 - HDFS Block Size: 128MB
 - Java Heap Size of Namenode/Secondary Namenode in Bytes: 8 GiB
 - Java Heap Size of HBase RegionServer in Bytes: 12 GiB
 - Java Heap Size of HBase Master in Bytes: 4 GiB
 - Java Heap Size of DataNode in Bytes: 1 GiB (default)
Number of regions per RegionServer: 19 (total 114 regions on 6 RS)
Key design: UUIDTIMESTAMP - UUID: 1-10M, TIMESTAMP: 1-N
Table design: 1 column family with 20 columns of 8 bytes

Get client:
Multiple threads
Each thread have its own tables instance with their Scanner.
Each thread have its own range of UUIDs and randomly draws beginning of time
range to build rowkey properly (see above).
Each time Scan requests same amount of rows, but with random rowkey.
 




--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/HBase-Performance-issue-tp4042836.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: HBase - Performance issue

2013-04-24 Thread Anoop John
Hi
   How many request handlers are there in ur RS?  Can you up this
number and see?

-Anoop-
On Wed, Apr 24, 2013 at 3:42 PM, kzurek kzu...@proximetry.pl wrote:

 The problem is that when I'm putting my data (multithreaded client, ~30MB/s
 traffic outgoing) into the cluster the load is equally spread over all
 RegionServer with 3.5% average CPU wait time (average CPU user: 51%). When
 I've added similar, mutlithreaded client that Scans for, let say, 100 last
 samples of randomly generated key from chosen time range, I'm getting high
 CPU wait time (20% and up) on two (or more if there is higher number of
 threads, default 10) random RegionServers. Therefore, machines that held
 those RS are getting very hot - one of the consequences is that number of
 store file is constantly increasing, up to the maximum limit. Rest of the
 RS
 are having 10-12% CPU wait time and everything seems to be OK (number of
 store files varies so they are being compacted and not increasing over
 time). Any ideas? Maybe  I could prioritize writes over reads somehow? Is
 it
 possible? If so what would be the best way to that and where it should be
 placed - on the client or cluster side)?

 Cluster specification:
 HBase Version   0.94.2-cdh4.2.0
 Hadoop Version  2.0.0-cdh4.2.0
 There are 6xDataNodes (5xHDD for storing data), 1xMasterNodes
 Other settings:
  - Bloom filters (ROWCOL) set
  - Short circuit turned on
  - HDFS Block Size: 128MB
  - Java Heap Size of Namenode/Secondary Namenode in Bytes: 8 GiB
  - Java Heap Size of HBase RegionServer in Bytes: 12 GiB
  - Java Heap Size of HBase Master in Bytes: 4 GiB
  - Java Heap Size of DataNode in Bytes: 1 GiB (default)
 Number of regions per RegionServer: 19 (total 114 regions on 6 RS)
 Key design: UUIDTIMESTAMP - UUID: 1-10M, TIMESTAMP: 1-N
 Table design: 1 column family with 20 columns of 8 bytes

 Get client:
 Multiple threads
 Each thread have its own tables instance with their Scanner.
 Each thread have its own range of UUIDs and randomly draws beginning of
 time
 range to build rowkey properly (see above).
 Each time Scan requests same amount of rows, but with random rowkey.





 --
 View this message in context:
 http://apache-hbase.679495.n3.nabble.com/HBase-Performance-issue-tp4042836.html
 Sent from the HBase User mailing list archive at Nabble.com.



Re: HBase - Performance issue

2013-04-24 Thread kzurek
I've following settings:
 hbase.master.handler.count = 25 (default value in CDH4.2)
 hbase.regionserver.handler.count = 20 (default 10)




--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/HBase-Performance-issue-tp4042836p4042840.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: HBase - Performance issue

2013-04-24 Thread lars hofhansl
You may have run into https://issues.apache.org/jira/browse/HBASE-7336 (which 
is in 0.94.4)
(Although I had not observed this effect as much when short circuit reads are 
enabled)



- Original Message -
From: kzurek kzu...@proximetry.pl
To: user@hbase.apache.org
Cc: 
Sent: Wednesday, April 24, 2013 3:12 AM
Subject: HBase - Performance issue

The problem is that when I'm putting my data (multithreaded client, ~30MB/s
traffic outgoing) into the cluster the load is equally spread over all
RegionServer with 3.5% average CPU wait time (average CPU user: 51%). When
I've added similar, mutlithreaded client that Scans for, let say, 100 last
samples of randomly generated key from chosen time range, I'm getting high
CPU wait time (20% and up) on two (or more if there is higher number of
threads, default 10) random RegionServers. Therefore, machines that held
those RS are getting very hot - one of the consequences is that number of
store file is constantly increasing, up to the maximum limit. Rest of the RS
are having 10-12% CPU wait time and everything seems to be OK (number of
store files varies so they are being compacted and not increasing over
time). Any ideas? Maybe  I could prioritize writes over reads somehow? Is it
possible? If so what would be the best way to that and where it should be
placed - on the client or cluster side)? 

Cluster specification:
HBase Version    0.94.2-cdh4.2.0
Hadoop Version    2.0.0-cdh4.2.0
There are 6xDataNodes (5xHDD for storing data), 1xMasterNodes
Other settings:
- Bloom filters (ROWCOL) set
- Short circuit turned on
- HDFS Block Size: 128MB
- Java Heap Size of Namenode/Secondary Namenode in Bytes: 8 GiB
- Java Heap Size of HBase RegionServer in Bytes: 12 GiB
- Java Heap Size of HBase Master in Bytes: 4 GiB
- Java Heap Size of DataNode in Bytes: 1 GiB (default)
Number of regions per RegionServer: 19 (total 114 regions on 6 RS)
Key design: UUIDTIMESTAMP - UUID: 1-10M, TIMESTAMP: 1-N
Table design: 1 column family with 20 columns of 8 bytes

Get client:
Multiple threads
Each thread have its own tables instance with their Scanner.
Each thread have its own range of UUIDs and randomly draws beginning of time
range to build rowkey properly (see above).
Each time Scan requests same amount of rows, but with random rowkey.





--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/HBase-Performance-issue-tp4042836.html
Sent from the HBase User mailing list archive at Nabble.com.



Re: hbase performance issue

2012-03-12 Thread Jean-Daniel Cryans
Your post is missing the most important configurations, mainly the
region server heap size and GC configs.

Also, how much of those 300GB do you need to serve? Does the working
dataset fit in cache?

J-D

On Sun, Mar 11, 2012 at 12:39 PM, Антон Лыска ant...@wildec.com wrote:
 Hi guys!

 I have a little instance of hbase cluster with only 2 machines (8core cpu,
 12G mem, 3*1GB hdd on each machine).
 I use cloudera's cdh3u1 distro.
 Cluster serves two tables and total data size is about 300 GB with 300
 regions.
 The average Get time is usually 20-50ms, but sometimes it rises up to
 500-800ms which is unacceptable.

 Gets per day: 13*10^6
 Puts per day: 11*10^6
 Deletes per day: 2*10^6

 My conf is:
 configuration
  
 property
 namedfs.replication/name
 value2/value
 /property

 property
 namehbase.regionserver.handler.count/name
  value50/value
 /property

 property
 namehbase.hregion.majorcompaction/name
  value864/value
 /property
  /configuration

 My scheme is:
 {NAME = 'table1', MAX_FILESIZE = '536870912', FAMILIES = [{NAME = 'c',
 BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE',
 VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = '16384', IN_MEMORY =
 'false', BLOCKCACHE = 'true'}, {NAME = 'p', BLOOMFILTER = 'NONE',
 REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '1', TTL =
 '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE =
 'true'}, {NAME = 's', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0',
 COMPRESSION = 'NONE', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE =
 '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}

 {NAME = 'table2', FAMILIES = [{NAME = 'n', BLOOMFILTER = 'NONE',
 REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', TTL =
 '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE =
 'true'}]}

 I disabled major compaction by setting a big value, and run it manually
 each day at 3:00am (server is least loaded at that time).
 Get time usually starts increasing at around 23:00-24:00.
 Once hbase is restarted, Get time returns to 20ms.
 What it can be? what options should I set to avoid this issue?

 Also I have installed ganglia, but I haven't seen anything strange there.

 Thank you in advance!

 Best regards, Anton.


Re: hbase performance issue

2012-03-11 Thread Doug Meil

If you're using Cloudera, you want to be on CDH3u3 because it has several
HDFS performance fixes for low-latency reads.

That still doesn't address your 23:00-hour perf issue, but that's
something that will help.



On 3/11/12 3:39 PM, Антон Лыска ant...@wildec.com wrote:

Hi guys!

I have a little instance of hbase cluster with only 2 machines (8core cpu,
12G mem, 3*1GB hdd on each machine).
I use cloudera's cdh3u1 distro.
Cluster serves two tables and total data size is about 300 GB with 300
regions.
The average Get time is usually 20-50ms, but sometimes it rises up to
500-800ms which is unacceptable.

Gets per day: 13*10^6
Puts per day: 11*10^6
Deletes per day: 2*10^6

My conf is:
configuration
 
property
namedfs.replication/name
value2/value
/property

property
namehbase.regionserver.handler.count/name
 value50/value
/property

property
namehbase.hregion.majorcompaction/name
 value864/value
/property
  /configuration

My scheme is:
{NAME = 'table1', MAX_FILESIZE = '536870912', FAMILIES = [{NAME = 'c',
BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE',
VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = '16384', IN_MEMORY =
'false', BLOCKCACHE = 'true'}, {NAME = 'p', BLOOMFILTER = 'NONE',
REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '1', TTL =
'2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE =
'true'}, {NAME = 's', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0',
COMPRESSION = 'NONE', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE =
'65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}

{NAME = 'table2', FAMILIES = [{NAME = 'n', BLOOMFILTER = 'NONE',
REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', TTL =
'2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE =
'true'}]}

I disabled major compaction by setting a big value, and run it manually
each day at 3:00am (server is least loaded at that time).
Get time usually starts increasing at around 23:00-24:00.
Once hbase is restarted, Get time returns to 20ms.
What it can be? what options should I set to avoid this issue?

Also I have installed ganglia, but I haven't seen anything strange there.

Thank you in advance!

Best regards, Anton.