Re: High cpu usage on a region server

2013-09-15 Thread OpenSource Dev
We patched HBase 0.94.6 with HBASE-9428, and now the difference is as
day and night.
Read latency has been very consistent and haven't seen any cpu load
issue in last 24+hrs

Thank you all for helping us out to resolve this issue.

Bikrant

On Thu, Sep 12, 2013 at 10:25 AM, lars hofhansl la...@apache.org wrote:
 Not that I am aware of. Reduce the HFile block size will lessen this problem 
 (but then cause other issues).

 It's just a fix to the RegexStringFilter. You can just recompile that and 
 deploy it to the RegionServers (need to make it's in the class path before 
 the HBase jars).
 Probably easier to roll a new release. It's a shame we did not see this 
 earlier.


 -- Lars



 
  From: OpenSource Dev dev.opensou...@gmail.com
 To: user@hbase.apache.org; lars hofhansl la...@apache.org
 Sent: Thursday, September 12, 2013 9:52 AM
 Subject: Re: High cpu usage on a region server


 Thanks Lars.

 Are there any other workarounds for this issue until we get the fix ?
 If not we might have to do the patch and rollout custom pkg.

 On Thu, Sep 12, 2013 at 8:36 AM, lars hofhansl la...@apache.org wrote:
 Yep... Very likely HBASE-9428:

 8 threads:
java.lang.Thread.State: RUNNABLE
 at java.util.Arrays.copyOf(Arrays.java:2786)
 at java.lang.StringCoding.decode(StringCoding.java:178)
 at java.lang.String.init(String.java:483)
 at 
 org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96)
 ...

 4 threads:
java.lang.Thread.State: RUNNABLE
 at sun.nio.cs.ISO_8859_1$Decoder.decodeArrayLoop(ISO_8859_1.java:79)
 at sun.nio.cs.ISO_8859_1$Decoder.decodeLoop(ISO_8859_1.java:106)
 at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:544)
 at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:140)
 at java.lang.StringCoding.decode(StringCoding.java:179)
 at java.lang.String.init(String.java:483)
 at 
 org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96)

 It's also consistent with what you see: Lots of garbage (hence tweaking your 
 GC options had a significant effect)
 The fix is in 0.94.12, which is in RC right now, probably to be released 
 early next week.

 -- Lars



 
  From: OpenSource Dev dev.opensou...@gmail.com
 To: user@hbase.apache.org
 Sent: Thursday, September 12, 2013 8:15 AM
 Subject: Re: High cpu usage on a region server


 A server started getting busy last night, but this time it took ~5 hrs
 to get from 15% busy to 75% busy. It is not running 80% flat-out yet.
 But this is still very high compared to other servers that are running
 under ~25% cpu usage. Only change that I made yesterday was the
 addition of -XX:+UseParNewGC to hbase startup command.

 http://pastebin.com/VRmujgyH

 On Wed, Sep 11, 2013 at 2:28 PM, Stack st...@duboce.net wrote:
 Can you thread dump the busy server and pastebin it?
 Thanks,
 St.Ack


 On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev 
 dev.opensou...@gmail.comwrote:

 Hi,

 I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no
 issues with writes/puts. System is handles upto 800k puts per seconds
 without issue. On average we do 250k puts per second.

 I am having the problem with Reads, I've also isolated where the
 problem is but not been able to find the root cause.

 I have 16 machines running hbase-region server, each has ~35 regions.
 Once in a while cpu goes flatout 80% in 1 region server. These are the
 things i've noticed in ganglia:

 hbase.regionserver.request - evenly distributed. Not seeing any spikes
 on the busy server
 hbase.regionserver.blockCacheSize - between 500MB and 1000MB
 hbase.regionserver.compactionQueueSize - avg 2 or less
 hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other
 nodes


 JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC

 I've noticed the system load moves to a different region, sometimes
 within a minute, if the busy region is restarted.

 Any suggestion what could be causing the load and/or what other
 metrics should I check ?


 Thank you!



Re: High cpu usage on a region server

2013-09-15 Thread lars hofhansl
Thanks for reporting back Bikrant, glad that that turned out to be issue.

From: OpenSource Dev dev.opensou...@gmail.com
To: user@hbase.apache.org; lars hofhansl la...@apache.org 
Sent: Saturday, September 14, 2013 11:21 PM
Subject: Re: High cpu usage on a region server


We patched HBase 0.94.6 with HBASE-9428, and now the difference is as
day and night.
Read latency has been very consistent and haven't seen any cpu load
issue in last 24+hrs

Thank you all for helping us out to resolve this issue.

Bikrant

On Thu, Sep 12, 2013 at 10:25 AM, lars hofhansl la...@apache.org wrote:
 Not that I am aware of. Reduce the HFile block size will lessen this problem 
 (but then cause other issues).

 It's just a fix to the RegexStringFilter. You can just recompile that and 
 deploy it to the RegionServers (need to make it's in the class path before 
 the HBase jars).
 Probably easier to roll a new release. It's a shame we did not see this 
 earlier.


 -- Lars



 
  From: OpenSource Dev dev.opensou...@gmail.com
 To: user@hbase.apache.org; lars hofhansl la...@apache.org
 Sent: Thursday, September 12, 2013 9:52 AM
 Subject: Re: High cpu usage on a region server


 Thanks Lars.

 Are there any other workarounds for this issue until we get the fix ?
 If not we might have to do the patch and rollout custom pkg.

 On Thu, Sep 12, 2013 at 8:36 AM, lars hofhansl la...@apache.org wrote:
 Yep... Very likely HBASE-9428:

 8 threads:
    java.lang.Thread.State: RUNNABLE
         at java.util.Arrays.copyOf(Arrays.java:2786)
         at java.lang.StringCoding.decode(StringCoding.java:178)
         at java.lang.String.init(String.java:483)
         at 
org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96)
         ...

 4 threads:
    java.lang.Thread.State: RUNNABLE
         at sun.nio.cs.ISO_8859_1$Decoder.decodeArrayLoop(ISO_8859_1.java:79)
         at sun.nio.cs.ISO_8859_1$Decoder.decodeLoop(ISO_8859_1.java:106)
         at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:544)
         at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:140)
         at java.lang.StringCoding.decode(StringCoding.java:179)
         at java.lang.String.init(String.java:483)
         at 
org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96)

 It's also consistent with what you see: Lots of garbage (hence tweaking your 
 GC options had a significant effect)
 The fix is in 0.94.12, which is in RC right now, probably to be released 
 early next week.

 -- Lars



 
  From: OpenSource Dev dev.opensou...@gmail.com
 To: user@hbase.apache.org
 Sent: Thursday, September 12, 2013 8:15 AM
 Subject: Re: High cpu usage on a region server


 A server started getting busy last night, but this time it took ~5 hrs
 to get from 15% busy to 75% busy. It is not running 80% flat-out yet.
 But this is still very high compared to other servers that are running
 under ~25% cpu usage. Only change that I made yesterday was the
 addition of -XX:+UseParNewGC to hbase startup command.

 http://pastebin.com/VRmujgyH

 On Wed, Sep 11, 2013 at 2:28 PM, Stack st...@duboce.net wrote:
 Can you thread dump the busy server and pastebin it?
 Thanks,
 St.Ack


 On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev 
 dev.opensou...@gmail.comwrote:

 Hi,

 I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no
 issues with writes/puts. System is handles upto 800k puts per seconds
 without issue. On average we do 250k puts per second.

 I am having the problem with Reads, I've also isolated where the
 problem is but not been able to find the root cause.

 I have 16 machines running hbase-region server, each has ~35 regions.
 Once in a while cpu goes flatout 80% in 1 region server. These are the
 things i've noticed in ganglia:

 hbase.regionserver.request - evenly distributed. Not seeing any spikes
 on the busy server
 hbase.regionserver.blockCacheSize - between 500MB and 1000MB
 hbase.regionserver.compactionQueueSize - avg 2 or less
 hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other
 nodes


 JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC

 I've noticed the system load moves to a different region, sometimes
 within a minute, if the busy region is restarted.

 Any suggestion what could be causing the load and/or what other
 metrics should I check ?


 Thank you!



Re: High cpu usage on a region server

2013-09-12 Thread lars hofhansl
Yep... Very likely HBASE-9428:

8 threads:
   java.lang.Thread.State: RUNNABLE
    at java.util.Arrays.copyOf(Arrays.java:2786)
    at java.lang.StringCoding.decode(StringCoding.java:178)
    at java.lang.String.init(String.java:483)
    at 
org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96)
    ...

4 threads:
   java.lang.Thread.State: RUNNABLE
    at sun.nio.cs.ISO_8859_1$Decoder.decodeArrayLoop(ISO_8859_1.java:79)
    at sun.nio.cs.ISO_8859_1$Decoder.decodeLoop(ISO_8859_1.java:106)
    at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:544)
    at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:140)
    at java.lang.StringCoding.decode(StringCoding.java:179)
    at java.lang.String.init(String.java:483)
    at 
org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96)

It's also consistent with what you see: Lots of garbage (hence tweaking your GC 
options had a significant effect)
The fix is in 0.94.12, which is in RC right now, probably to be released early 
next week.

-- Lars




 From: OpenSource Dev dev.opensou...@gmail.com
To: user@hbase.apache.org 
Sent: Thursday, September 12, 2013 8:15 AM
Subject: Re: High cpu usage on a region server
 

A server started getting busy last night, but this time it took ~5 hrs
to get from 15% busy to 75% busy. It is not running 80% flat-out yet.
But this is still very high compared to other servers that are running
under ~25% cpu usage. Only change that I made yesterday was the
addition of -XX:+UseParNewGC to hbase startup command.

http://pastebin.com/VRmujgyH

On Wed, Sep 11, 2013 at 2:28 PM, Stack st...@duboce.net wrote:
 Can you thread dump the busy server and pastebin it?
 Thanks,
 St.Ack


 On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev 
 dev.opensou...@gmail.comwrote:

 Hi,

 I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no
 issues with writes/puts. System is handles upto 800k puts per seconds
 without issue. On average we do 250k puts per second.

 I am having the problem with Reads, I've also isolated where the
 problem is but not been able to find the root cause.

 I have 16 machines running hbase-region server, each has ~35 regions.
 Once in a while cpu goes flatout 80% in 1 region server. These are the
 things i've noticed in ganglia:

 hbase.regionserver.request - evenly distributed. Not seeing any spikes
 on the busy server
 hbase.regionserver.blockCacheSize - between 500MB and 1000MB
 hbase.regionserver.compactionQueueSize - avg 2 or less
 hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other
 nodes


 JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC

 I've noticed the system load moves to a different region, sometimes
 within a minute, if the busy region is restarted.

 Any suggestion what could be causing the load and/or what other
 metrics should I check ?


 Thank you!


Re: High cpu usage on a region server

2013-09-12 Thread Jean-Daniel Cryans
Or roll back to CDH 4.2's HBase. They are fully compatible.

J-D


On Thu, Sep 12, 2013 at 10:25 AM, lars hofhansl la...@apache.org wrote:

 Not that I am aware of. Reduce the HFile block size will lessen this
 problem (but then cause other issues).

 It's just a fix to the RegexStringFilter. You can just recompile that and
 deploy it to the RegionServers (need to make it's in the class path before
 the HBase jars).
 Probably easier to roll a new release. It's a shame we did not see this
 earlier.


 -- Lars



 
  From: OpenSource Dev dev.opensou...@gmail.com
 To: user@hbase.apache.org; lars hofhansl la...@apache.org
 Sent: Thursday, September 12, 2013 9:52 AM
 Subject: Re: High cpu usage on a region server


 Thanks Lars.

 Are there any other workarounds for this issue until we get the fix ?
 If not we might have to do the patch and rollout custom pkg.

 On Thu, Sep 12, 2013 at 8:36 AM, lars hofhansl la...@apache.org wrote:
  Yep... Very likely HBASE-9428:
 
  8 threads:
 java.lang.Thread.State: RUNNABLE
  at java.util.Arrays.copyOf(Arrays.java:2786)
  at java.lang.StringCoding.decode(StringCoding.java:178)
  at java.lang.String.init(String.java:483)
  at
 org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96)
  ...
 
  4 threads:
 java.lang.Thread.State: RUNNABLE
  at
 sun.nio.cs.ISO_8859_1$Decoder.decodeArrayLoop(ISO_8859_1.java:79)
  at sun.nio.cs.ISO_8859_1$Decoder.decodeLoop(ISO_8859_1.java:106)
  at
 java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:544)
  at
 java.lang.StringCoding$StringDecoder.decode(StringCoding.java:140)
  at java.lang.StringCoding.decode(StringCoding.java:179)
  at java.lang.String.init(String.java:483)
  at
 org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96)
 
  It's also consistent with what you see: Lots of garbage (hence tweaking
 your GC options had a significant effect)
  The fix is in 0.94.12, which is in RC right now, probably to be released
 early next week.
 
  -- Lars
 
 
 
  
   From: OpenSource Dev dev.opensou...@gmail.com
  To: user@hbase.apache.org
  Sent: Thursday, September 12, 2013 8:15 AM
  Subject: Re: High cpu usage on a region server
 
 
  A server started getting busy last night, but this time it took ~5 hrs
  to get from 15% busy to 75% busy. It is not running 80% flat-out yet.
  But this is still very high compared to other servers that are running
  under ~25% cpu usage. Only change that I made yesterday was the
  addition of -XX:+UseParNewGC to hbase startup command.
 
  http://pastebin.com/VRmujgyH
 
  On Wed, Sep 11, 2013 at 2:28 PM, Stack st...@duboce.net wrote:
  Can you thread dump the busy server and pastebin it?
  Thanks,
  St.Ack
 
 
  On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev 
 dev.opensou...@gmail.comwrote:
 
  Hi,
 
  I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no
  issues with writes/puts. System is handles upto 800k puts per seconds
  without issue. On average we do 250k puts per second.
 
  I am having the problem with Reads, I've also isolated where the
  problem is but not been able to find the root cause.
 
  I have 16 machines running hbase-region server, each has ~35 regions.
  Once in a while cpu goes flatout 80% in 1 region server. These are the
  things i've noticed in ganglia:
 
  hbase.regionserver.request - evenly distributed. Not seeing any spikes
  on the busy server
  hbase.regionserver.blockCacheSize - between 500MB and 1000MB
  hbase.regionserver.compactionQueueSize - avg 2 or less
  hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other
  nodes
 
 
  JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC
  -XX:+UseConcMarkSweepGC
 
  I've noticed the system load moves to a different region, sometimes
  within a minute, if the busy region is restarted.
 
  Any suggestion what could be causing the load and/or what other
  metrics should I check ?
 
 
  Thank you!
 



Re: High cpu usage on a region server

2013-09-12 Thread lars hofhansl
Not that I am aware of. Reduce the HFile block size will lessen this problem 
(but then cause other issues).

It's just a fix to the RegexStringFilter. You can just recompile that and 
deploy it to the RegionServers (need to make it's in the class path before the 
HBase jars).
Probably easier to roll a new release. It's a shame we did not see this earlier.


-- Lars




 From: OpenSource Dev dev.opensou...@gmail.com
To: user@hbase.apache.org; lars hofhansl la...@apache.org 
Sent: Thursday, September 12, 2013 9:52 AM
Subject: Re: High cpu usage on a region server
 

Thanks Lars.

Are there any other workarounds for this issue until we get the fix ?
If not we might have to do the patch and rollout custom pkg.

On Thu, Sep 12, 2013 at 8:36 AM, lars hofhansl la...@apache.org wrote:
 Yep... Very likely HBASE-9428:

 8 threads:
    java.lang.Thread.State: RUNNABLE
         at java.util.Arrays.copyOf(Arrays.java:2786)
         at java.lang.StringCoding.decode(StringCoding.java:178)
         at java.lang.String.init(String.java:483)
         at 
org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96)
         ...

 4 threads:
    java.lang.Thread.State: RUNNABLE
         at sun.nio.cs.ISO_8859_1$Decoder.decodeArrayLoop(ISO_8859_1.java:79)
         at sun.nio.cs.ISO_8859_1$Decoder.decodeLoop(ISO_8859_1.java:106)
         at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:544)
         at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:140)
         at java.lang.StringCoding.decode(StringCoding.java:179)
         at java.lang.String.init(String.java:483)
         at 
org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96)

 It's also consistent with what you see: Lots of garbage (hence tweaking your 
 GC options had a significant effect)
 The fix is in 0.94.12, which is in RC right now, probably to be released 
 early next week.

 -- Lars



 
  From: OpenSource Dev dev.opensou...@gmail.com
 To: user@hbase.apache.org
 Sent: Thursday, September 12, 2013 8:15 AM
 Subject: Re: High cpu usage on a region server


 A server started getting busy last night, but this time it took ~5 hrs
 to get from 15% busy to 75% busy. It is not running 80% flat-out yet.
 But this is still very high compared to other servers that are running
 under ~25% cpu usage. Only change that I made yesterday was the
 addition of -XX:+UseParNewGC to hbase startup command.

 http://pastebin.com/VRmujgyH

 On Wed, Sep 11, 2013 at 2:28 PM, Stack st...@duboce.net wrote:
 Can you thread dump the busy server and pastebin it?
 Thanks,
 St.Ack


 On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev 
 dev.opensou...@gmail.comwrote:

 Hi,

 I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no
 issues with writes/puts. System is handles upto 800k puts per seconds
 without issue. On average we do 250k puts per second.

 I am having the problem with Reads, I've also isolated where the
 problem is but not been able to find the root cause.

 I have 16 machines running hbase-region server, each has ~35 regions.
 Once in a while cpu goes flatout 80% in 1 region server. These are the
 things i've noticed in ganglia:

 hbase.regionserver.request - evenly distributed. Not seeing any spikes
 on the busy server
 hbase.regionserver.blockCacheSize - between 500MB and 1000MB
 hbase.regionserver.compactionQueueSize - avg 2 or less
 hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other
 nodes


 JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC

 I've noticed the system load moves to a different region, sometimes
 within a minute, if the busy region is restarted.

 Any suggestion what could be causing the load and/or what other
 metrics should I check ?


 Thank you!


Re: High cpu usage on a region server

2013-09-12 Thread OpenSource Dev
Thanks Lars.

Are there any other workarounds for this issue until we get the fix ?
If not we might have to do the patch and rollout custom pkg.

On Thu, Sep 12, 2013 at 8:36 AM, lars hofhansl la...@apache.org wrote:
 Yep... Very likely HBASE-9428:

 8 threads:
java.lang.Thread.State: RUNNABLE
 at java.util.Arrays.copyOf(Arrays.java:2786)
 at java.lang.StringCoding.decode(StringCoding.java:178)
 at java.lang.String.init(String.java:483)
 at 
 org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96)
 ...

 4 threads:
java.lang.Thread.State: RUNNABLE
 at sun.nio.cs.ISO_8859_1$Decoder.decodeArrayLoop(ISO_8859_1.java:79)
 at sun.nio.cs.ISO_8859_1$Decoder.decodeLoop(ISO_8859_1.java:106)
 at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:544)
 at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:140)
 at java.lang.StringCoding.decode(StringCoding.java:179)
 at java.lang.String.init(String.java:483)
 at 
 org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96)

 It's also consistent with what you see: Lots of garbage (hence tweaking your 
 GC options had a significant effect)
 The fix is in 0.94.12, which is in RC right now, probably to be released 
 early next week.

 -- Lars



 
  From: OpenSource Dev dev.opensou...@gmail.com
 To: user@hbase.apache.org
 Sent: Thursday, September 12, 2013 8:15 AM
 Subject: Re: High cpu usage on a region server


 A server started getting busy last night, but this time it took ~5 hrs
 to get from 15% busy to 75% busy. It is not running 80% flat-out yet.
 But this is still very high compared to other servers that are running
 under ~25% cpu usage. Only change that I made yesterday was the
 addition of -XX:+UseParNewGC to hbase startup command.

 http://pastebin.com/VRmujgyH

 On Wed, Sep 11, 2013 at 2:28 PM, Stack st...@duboce.net wrote:
 Can you thread dump the busy server and pastebin it?
 Thanks,
 St.Ack


 On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev 
 dev.opensou...@gmail.comwrote:

 Hi,

 I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no
 issues with writes/puts. System is handles upto 800k puts per seconds
 without issue. On average we do 250k puts per second.

 I am having the problem with Reads, I've also isolated where the
 problem is but not been able to find the root cause.

 I have 16 machines running hbase-region server, each has ~35 regions.
 Once in a while cpu goes flatout 80% in 1 region server. These are the
 things i've noticed in ganglia:

 hbase.regionserver.request - evenly distributed. Not seeing any spikes
 on the busy server
 hbase.regionserver.blockCacheSize - between 500MB and 1000MB
 hbase.regionserver.compactionQueueSize - avg 2 or less
 hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other
 nodes


 JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC

 I've noticed the system load moves to a different region, sometimes
 within a minute, if the busy region is restarted.

 Any suggestion what could be causing the load and/or what other
 metrics should I check ?


 Thank you!



Re: High cpu usage on a region server

2013-09-12 Thread OpenSource Dev
A server started getting busy last night, but this time it took ~5 hrs
to get from 15% busy to 75% busy. It is not running 80% flat-out yet.
But this is still very high compared to other servers that are running
under ~25% cpu usage. Only change that I made yesterday was the
addition of -XX:+UseParNewGC to hbase startup command.

http://pastebin.com/VRmujgyH

On Wed, Sep 11, 2013 at 2:28 PM, Stack st...@duboce.net wrote:
 Can you thread dump the busy server and pastebin it?
 Thanks,
 St.Ack


 On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev 
 dev.opensou...@gmail.comwrote:

 Hi,

 I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no
 issues with writes/puts. System is handles upto 800k puts per seconds
 without issue. On average we do 250k puts per second.

 I am having the problem with Reads, I've also isolated where the
 problem is but not been able to find the root cause.

 I have 16 machines running hbase-region server, each has ~35 regions.
 Once in a while cpu goes flatout 80% in 1 region server. These are the
 things i've noticed in ganglia:

 hbase.regionserver.request - evenly distributed. Not seeing any spikes
 on the busy server
 hbase.regionserver.blockCacheSize - between 500MB and 1000MB
 hbase.regionserver.compactionQueueSize - avg 2 or less
 hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other
 nodes


 JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC

 I've noticed the system load moves to a different region, sometimes
 within a minute, if the busy region is restarted.

 Any suggestion what could be causing the load and/or what other
 metrics should I check ?


 Thank you!



Re: High cpu usage on a region server

2013-09-11 Thread lars hofhansl
You might have run into HBASE-9428

-- Lars




 From: OpenSource Dev dev.opensou...@gmail.com
To: user@hbase.apache.org 
Sent: Wednesday, September 11, 2013 1:49 PM
Subject: High cpu usage on a region server
 

Hi,

I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no
issues with writes/puts. System is handles upto 800k puts per seconds
without issue. On average we do 250k puts per second.

I am having the problem with Reads, I've also isolated where the
problem is but not been able to find the root cause.

I have 16 machines running hbase-region server, each has ~35 regions.
Once in a while cpu goes flatout 80% in 1 region server. These are the
things i've noticed in ganglia:

hbase.regionserver.request - evenly distributed. Not seeing any spikes
on the busy server
hbase.regionserver.blockCacheSize - between 500MB and 1000MB
hbase.regionserver.compactionQueueSize - avg 2 or less
hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes


JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC

I've noticed the system load moves to a different region, sometimes
within a minute, if the busy region is restarted.

Any suggestion what could be causing the load and/or what other
metrics should I check ?


Thank you!

Re: High cpu usage on a region server

2013-09-11 Thread Ted Yu
Have you turned on short-circuit read ?

Cheers


On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev dev.opensou...@gmail.comwrote:

 Hi,

 I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no
 issues with writes/puts. System is handles upto 800k puts per seconds
 without issue. On average we do 250k puts per second.

 I am having the problem with Reads, I've also isolated where the
 problem is but not been able to find the root cause.

 I have 16 machines running hbase-region server, each has ~35 regions.
 Once in a while cpu goes flatout 80% in 1 region server. These are the
 things i've noticed in ganglia:

 hbase.regionserver.request - evenly distributed. Not seeing any spikes
 on the busy server
 hbase.regionserver.blockCacheSize - between 500MB and 1000MB
 hbase.regionserver.compactionQueueSize - avg 2 or less
 hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other
 nodes


 JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC

 I've noticed the system load moves to a different region, sometimes
 within a minute, if the busy region is restarted.

 Any suggestion what could be causing the load and/or what other
 metrics should I check ?


 Thank you!



Re: High cpu usage on a region server

2013-09-11 Thread Stack
Can you thread dump the busy server and pastebin it?
Thanks,
St.Ack


On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev dev.opensou...@gmail.comwrote:

 Hi,

 I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no
 issues with writes/puts. System is handles upto 800k puts per seconds
 without issue. On average we do 250k puts per second.

 I am having the problem with Reads, I've also isolated where the
 problem is but not been able to find the root cause.

 I have 16 machines running hbase-region server, each has ~35 regions.
 Once in a while cpu goes flatout 80% in 1 region server. These are the
 things i've noticed in ganglia:

 hbase.regionserver.request - evenly distributed. Not seeing any spikes
 on the busy server
 hbase.regionserver.blockCacheSize - between 500MB and 1000MB
 hbase.regionserver.compactionQueueSize - avg 2 or less
 hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other
 nodes


 JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC

 I've noticed the system load moves to a different region, sometimes
 within a minute, if the busy region is restarted.

 Any suggestion what could be causing the load and/or what other
 metrics should I check ?


 Thank you!



Re: High cpu usage on a region server

2013-09-11 Thread OpenSource Dev
No, dfs.client.read.shortcircuit is set to false by default in our cluster.

Looks like this is a good performance improvement parameter, are there
any side effects of turning it on ?

Thx

On Wed, Sep 11, 2013 at 1:57 PM, Ted Yu yuzhih...@gmail.com wrote:
 Have you turned on short-circuit read ?

 Cheers


 On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev 
 dev.opensou...@gmail.comwrote:

 Hi,

 I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no
 issues with writes/puts. System is handles upto 800k puts per seconds
 without issue. On average we do 250k puts per second.

 I am having the problem with Reads, I've also isolated where the
 problem is but not been able to find the root cause.

 I have 16 machines running hbase-region server, each has ~35 regions.
 Once in a while cpu goes flatout 80% in 1 region server. These are the
 things i've noticed in ganglia:

 hbase.regionserver.request - evenly distributed. Not seeing any spikes
 on the busy server
 hbase.regionserver.blockCacheSize - between 500MB and 1000MB
 hbase.regionserver.compactionQueueSize - avg 2 or less
 hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other
 nodes


 JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC

 I've noticed the system load moves to a different region, sometimes
 within a minute, if the busy region is restarted.

 Any suggestion what could be causing the load and/or what other
 metrics should I check ?


 Thank you!



High cpu usage on a region server

2013-09-11 Thread OpenSource Dev
Hi,

I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no
issues with writes/puts. System is handles upto 800k puts per seconds
without issue. On average we do 250k puts per second.

I am having the problem with Reads, I've also isolated where the
problem is but not been able to find the root cause.

I have 16 machines running hbase-region server, each has ~35 regions.
Once in a while cpu goes flatout 80% in 1 region server. These are the
things i've noticed in ganglia:

hbase.regionserver.request - evenly distributed. Not seeing any spikes
on the busy server
hbase.regionserver.blockCacheSize - between 500MB and 1000MB
hbase.regionserver.compactionQueueSize - avg 2 or less
hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes


JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC

I've noticed the system load moves to a different region, sometimes
within a minute, if the busy region is restarted.

Any suggestion what could be causing the load and/or what other
metrics should I check ?


Thank you!


Re: High cpu usage on a region server

2013-09-11 Thread OpenSource Dev
Load has not gone up since last 5 hrs :)
Will get the dump if it goes up again.

thx

On Wed, Sep 11, 2013 at 2:28 PM, Stack st...@duboce.net wrote:
 Can you thread dump the busy server and pastebin it?
 Thanks,
 St.Ack


 On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev 
 dev.opensou...@gmail.comwrote:

 Hi,

 I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no
 issues with writes/puts. System is handles upto 800k puts per seconds
 without issue. On average we do 250k puts per second.

 I am having the problem with Reads, I've also isolated where the
 problem is but not been able to find the root cause.

 I have 16 machines running hbase-region server, each has ~35 regions.
 Once in a while cpu goes flatout 80% in 1 region server. These are the
 things i've noticed in ganglia:

 hbase.regionserver.request - evenly distributed. Not seeing any spikes
 on the busy server
 hbase.regionserver.blockCacheSize - between 500MB and 1000MB
 hbase.regionserver.compactionQueueSize - avg 2 or less
 hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other
 nodes


 JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC

 I've noticed the system load moves to a different region, sometimes
 within a minute, if the busy region is restarted.

 Any suggestion what could be causing the load and/or what other
 metrics should I check ?


 Thank you!



Re: High cpu usage on a region server

2013-09-11 Thread OpenSource Dev
Hi Lars,

All the read  write requests are equally distributed across all region-servers.

If it is caused by the HBASE-9428 bug, any idea why it would impact
only 1 reason server at a given time ?

Thx


On Wed, Sep 11, 2013 at 1:55 PM, lars hofhansl la...@apache.org wrote:
 You might have run into HBASE-9428

 -- Lars



 
  From: OpenSource Dev dev.opensou...@gmail.com
 To: user@hbase.apache.org
 Sent: Wednesday, September 11, 2013 1:49 PM
 Subject: High cpu usage on a region server


 Hi,

 I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no
 issues with writes/puts. System is handles upto 800k puts per seconds
 without issue. On average we do 250k puts per second.

 I am having the problem with Reads, I've also isolated where the
 problem is but not been able to find the root cause.

 I have 16 machines running hbase-region server, each has ~35 regions.
 Once in a while cpu goes flatout 80% in 1 region server. These are the
 things i've noticed in ganglia:

 hbase.regionserver.request - evenly distributed. Not seeing any spikes
 on the busy server
 hbase.regionserver.blockCacheSize - between 500MB and 1000MB
 hbase.regionserver.compactionQueueSize - avg 2 or less
 hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes


 JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC

 I've noticed the system load moves to a different region, sometimes
 within a minute, if the busy region is restarted.

 Any suggestion what could be causing the load and/or what other
 metrics should I check ?


 Thank you!


Re: High cpu usage on a region server

2013-09-11 Thread lars hofhansl
It might be a larger scan (maybe gathering many data points for a metric) 
hitting many regions, in that case you'd see only a single region server being 
busy at a given time, since HBase scans only a region at a time for a single 
client scan.


A thread dump would give us a better idea. J-D specifically mentions OpenTSDB 
in that jira.


-- Lars




 From: OpenSource Dev dev.opensou...@gmail.com
To: user@hbase.apache.org; lars hofhansl la...@apache.org 
Sent: Wednesday, September 11, 2013 8:59 PM
Subject: Re: High cpu usage on a region server
 

Hi Lars,

All the read  write requests are equally distributed across all region-servers.

If it is caused by the HBASE-9428 bug, any idea why it would impact
only 1 reason server at a given time ?

Thx


On Wed, Sep 11, 2013 at 1:55 PM, lars hofhansl la...@apache.org wrote:
 You might have run into HBASE-9428

 -- Lars



 
  From: OpenSource Dev dev.opensou...@gmail.com
 To: user@hbase.apache.org
 Sent: Wednesday, September 11, 2013 1:49 PM
 Subject: High cpu usage on a region server


 Hi,

 I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no
 issues with writes/puts. System is handles upto 800k puts per seconds
 without issue. On average we do 250k puts per second.

 I am having the problem with Reads, I've also isolated where the
 problem is but not been able to find the root cause.

 I have 16 machines running hbase-region server, each has ~35 regions.
 Once in a while cpu goes flatout 80% in 1 region server. These are the
 things i've noticed in ganglia:

 hbase.regionserver.request - evenly distributed. Not seeing any spikes
 on the busy server
 hbase.regionserver.blockCacheSize - between 500MB and 1000MB
 hbase.regionserver.compactionQueueSize - avg 2 or less
 hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes


 JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC

 I've noticed the system load moves to a different region, sometimes
 within a minute, if the busy region is restarted.

 Any suggestion what could be causing the load and/or what other
 metrics should I check ?


 Thank you!