Re: High cpu usage on a region server
We patched HBase 0.94.6 with HBASE-9428, and now the difference is as day and night. Read latency has been very consistent and haven't seen any cpu load issue in last 24+hrs Thank you all for helping us out to resolve this issue. Bikrant On Thu, Sep 12, 2013 at 10:25 AM, lars hofhansl la...@apache.org wrote: Not that I am aware of. Reduce the HFile block size will lessen this problem (but then cause other issues). It's just a fix to the RegexStringFilter. You can just recompile that and deploy it to the RegionServers (need to make it's in the class path before the HBase jars). Probably easier to roll a new release. It's a shame we did not see this earlier. -- Lars From: OpenSource Dev dev.opensou...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Thursday, September 12, 2013 9:52 AM Subject: Re: High cpu usage on a region server Thanks Lars. Are there any other workarounds for this issue until we get the fix ? If not we might have to do the patch and rollout custom pkg. On Thu, Sep 12, 2013 at 8:36 AM, lars hofhansl la...@apache.org wrote: Yep... Very likely HBASE-9428: 8 threads: java.lang.Thread.State: RUNNABLE at java.util.Arrays.copyOf(Arrays.java:2786) at java.lang.StringCoding.decode(StringCoding.java:178) at java.lang.String.init(String.java:483) at org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96) ... 4 threads: java.lang.Thread.State: RUNNABLE at sun.nio.cs.ISO_8859_1$Decoder.decodeArrayLoop(ISO_8859_1.java:79) at sun.nio.cs.ISO_8859_1$Decoder.decodeLoop(ISO_8859_1.java:106) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:544) at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:140) at java.lang.StringCoding.decode(StringCoding.java:179) at java.lang.String.init(String.java:483) at org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96) It's also consistent with what you see: Lots of garbage (hence tweaking your GC options had a significant effect) The fix is in 0.94.12, which is in RC right now, probably to be released early next week. -- Lars From: OpenSource Dev dev.opensou...@gmail.com To: user@hbase.apache.org Sent: Thursday, September 12, 2013 8:15 AM Subject: Re: High cpu usage on a region server A server started getting busy last night, but this time it took ~5 hrs to get from 15% busy to 75% busy. It is not running 80% flat-out yet. But this is still very high compared to other servers that are running under ~25% cpu usage. Only change that I made yesterday was the addition of -XX:+UseParNewGC to hbase startup command. http://pastebin.com/VRmujgyH On Wed, Sep 11, 2013 at 2:28 PM, Stack st...@duboce.net wrote: Can you thread dump the busy server and pastebin it? Thanks, St.Ack On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev dev.opensou...@gmail.comwrote: Hi, I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no issues with writes/puts. System is handles upto 800k puts per seconds without issue. On average we do 250k puts per second. I am having the problem with Reads, I've also isolated where the problem is but not been able to find the root cause. I have 16 machines running hbase-region server, each has ~35 regions. Once in a while cpu goes flatout 80% in 1 region server. These are the things i've noticed in ganglia: hbase.regionserver.request - evenly distributed. Not seeing any spikes on the busy server hbase.regionserver.blockCacheSize - between 500MB and 1000MB hbase.regionserver.compactionQueueSize - avg 2 or less hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC -XX:+UseConcMarkSweepGC I've noticed the system load moves to a different region, sometimes within a minute, if the busy region is restarted. Any suggestion what could be causing the load and/or what other metrics should I check ? Thank you!
Re: High cpu usage on a region server
Thanks for reporting back Bikrant, glad that that turned out to be issue. From: OpenSource Dev dev.opensou...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Saturday, September 14, 2013 11:21 PM Subject: Re: High cpu usage on a region server We patched HBase 0.94.6 with HBASE-9428, and now the difference is as day and night. Read latency has been very consistent and haven't seen any cpu load issue in last 24+hrs Thank you all for helping us out to resolve this issue. Bikrant On Thu, Sep 12, 2013 at 10:25 AM, lars hofhansl la...@apache.org wrote: Not that I am aware of. Reduce the HFile block size will lessen this problem (but then cause other issues). It's just a fix to the RegexStringFilter. You can just recompile that and deploy it to the RegionServers (need to make it's in the class path before the HBase jars). Probably easier to roll a new release. It's a shame we did not see this earlier. -- Lars From: OpenSource Dev dev.opensou...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Thursday, September 12, 2013 9:52 AM Subject: Re: High cpu usage on a region server Thanks Lars. Are there any other workarounds for this issue until we get the fix ? If not we might have to do the patch and rollout custom pkg. On Thu, Sep 12, 2013 at 8:36 AM, lars hofhansl la...@apache.org wrote: Yep... Very likely HBASE-9428: 8 threads: java.lang.Thread.State: RUNNABLE at java.util.Arrays.copyOf(Arrays.java:2786) at java.lang.StringCoding.decode(StringCoding.java:178) at java.lang.String.init(String.java:483) at org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96) ... 4 threads: java.lang.Thread.State: RUNNABLE at sun.nio.cs.ISO_8859_1$Decoder.decodeArrayLoop(ISO_8859_1.java:79) at sun.nio.cs.ISO_8859_1$Decoder.decodeLoop(ISO_8859_1.java:106) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:544) at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:140) at java.lang.StringCoding.decode(StringCoding.java:179) at java.lang.String.init(String.java:483) at org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96) It's also consistent with what you see: Lots of garbage (hence tweaking your GC options had a significant effect) The fix is in 0.94.12, which is in RC right now, probably to be released early next week. -- Lars From: OpenSource Dev dev.opensou...@gmail.com To: user@hbase.apache.org Sent: Thursday, September 12, 2013 8:15 AM Subject: Re: High cpu usage on a region server A server started getting busy last night, but this time it took ~5 hrs to get from 15% busy to 75% busy. It is not running 80% flat-out yet. But this is still very high compared to other servers that are running under ~25% cpu usage. Only change that I made yesterday was the addition of -XX:+UseParNewGC to hbase startup command. http://pastebin.com/VRmujgyH On Wed, Sep 11, 2013 at 2:28 PM, Stack st...@duboce.net wrote: Can you thread dump the busy server and pastebin it? Thanks, St.Ack On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev dev.opensou...@gmail.comwrote: Hi, I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no issues with writes/puts. System is handles upto 800k puts per seconds without issue. On average we do 250k puts per second. I am having the problem with Reads, I've also isolated where the problem is but not been able to find the root cause. I have 16 machines running hbase-region server, each has ~35 regions. Once in a while cpu goes flatout 80% in 1 region server. These are the things i've noticed in ganglia: hbase.regionserver.request - evenly distributed. Not seeing any spikes on the busy server hbase.regionserver.blockCacheSize - between 500MB and 1000MB hbase.regionserver.compactionQueueSize - avg 2 or less hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC -XX:+UseConcMarkSweepGC I've noticed the system load moves to a different region, sometimes within a minute, if the busy region is restarted. Any suggestion what could be causing the load and/or what other metrics should I check ? Thank you!
Re: High cpu usage on a region server
Yep... Very likely HBASE-9428: 8 threads: java.lang.Thread.State: RUNNABLE at java.util.Arrays.copyOf(Arrays.java:2786) at java.lang.StringCoding.decode(StringCoding.java:178) at java.lang.String.init(String.java:483) at org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96) ... 4 threads: java.lang.Thread.State: RUNNABLE at sun.nio.cs.ISO_8859_1$Decoder.decodeArrayLoop(ISO_8859_1.java:79) at sun.nio.cs.ISO_8859_1$Decoder.decodeLoop(ISO_8859_1.java:106) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:544) at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:140) at java.lang.StringCoding.decode(StringCoding.java:179) at java.lang.String.init(String.java:483) at org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96) It's also consistent with what you see: Lots of garbage (hence tweaking your GC options had a significant effect) The fix is in 0.94.12, which is in RC right now, probably to be released early next week. -- Lars From: OpenSource Dev dev.opensou...@gmail.com To: user@hbase.apache.org Sent: Thursday, September 12, 2013 8:15 AM Subject: Re: High cpu usage on a region server A server started getting busy last night, but this time it took ~5 hrs to get from 15% busy to 75% busy. It is not running 80% flat-out yet. But this is still very high compared to other servers that are running under ~25% cpu usage. Only change that I made yesterday was the addition of -XX:+UseParNewGC to hbase startup command. http://pastebin.com/VRmujgyH On Wed, Sep 11, 2013 at 2:28 PM, Stack st...@duboce.net wrote: Can you thread dump the busy server and pastebin it? Thanks, St.Ack On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev dev.opensou...@gmail.comwrote: Hi, I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no issues with writes/puts. System is handles upto 800k puts per seconds without issue. On average we do 250k puts per second. I am having the problem with Reads, I've also isolated where the problem is but not been able to find the root cause. I have 16 machines running hbase-region server, each has ~35 regions. Once in a while cpu goes flatout 80% in 1 region server. These are the things i've noticed in ganglia: hbase.regionserver.request - evenly distributed. Not seeing any spikes on the busy server hbase.regionserver.blockCacheSize - between 500MB and 1000MB hbase.regionserver.compactionQueueSize - avg 2 or less hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC -XX:+UseConcMarkSweepGC I've noticed the system load moves to a different region, sometimes within a minute, if the busy region is restarted. Any suggestion what could be causing the load and/or what other metrics should I check ? Thank you!
Re: High cpu usage on a region server
Or roll back to CDH 4.2's HBase. They are fully compatible. J-D On Thu, Sep 12, 2013 at 10:25 AM, lars hofhansl la...@apache.org wrote: Not that I am aware of. Reduce the HFile block size will lessen this problem (but then cause other issues). It's just a fix to the RegexStringFilter. You can just recompile that and deploy it to the RegionServers (need to make it's in the class path before the HBase jars). Probably easier to roll a new release. It's a shame we did not see this earlier. -- Lars From: OpenSource Dev dev.opensou...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Thursday, September 12, 2013 9:52 AM Subject: Re: High cpu usage on a region server Thanks Lars. Are there any other workarounds for this issue until we get the fix ? If not we might have to do the patch and rollout custom pkg. On Thu, Sep 12, 2013 at 8:36 AM, lars hofhansl la...@apache.org wrote: Yep... Very likely HBASE-9428: 8 threads: java.lang.Thread.State: RUNNABLE at java.util.Arrays.copyOf(Arrays.java:2786) at java.lang.StringCoding.decode(StringCoding.java:178) at java.lang.String.init(String.java:483) at org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96) ... 4 threads: java.lang.Thread.State: RUNNABLE at sun.nio.cs.ISO_8859_1$Decoder.decodeArrayLoop(ISO_8859_1.java:79) at sun.nio.cs.ISO_8859_1$Decoder.decodeLoop(ISO_8859_1.java:106) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:544) at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:140) at java.lang.StringCoding.decode(StringCoding.java:179) at java.lang.String.init(String.java:483) at org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96) It's also consistent with what you see: Lots of garbage (hence tweaking your GC options had a significant effect) The fix is in 0.94.12, which is in RC right now, probably to be released early next week. -- Lars From: OpenSource Dev dev.opensou...@gmail.com To: user@hbase.apache.org Sent: Thursday, September 12, 2013 8:15 AM Subject: Re: High cpu usage on a region server A server started getting busy last night, but this time it took ~5 hrs to get from 15% busy to 75% busy. It is not running 80% flat-out yet. But this is still very high compared to other servers that are running under ~25% cpu usage. Only change that I made yesterday was the addition of -XX:+UseParNewGC to hbase startup command. http://pastebin.com/VRmujgyH On Wed, Sep 11, 2013 at 2:28 PM, Stack st...@duboce.net wrote: Can you thread dump the busy server and pastebin it? Thanks, St.Ack On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev dev.opensou...@gmail.comwrote: Hi, I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no issues with writes/puts. System is handles upto 800k puts per seconds without issue. On average we do 250k puts per second. I am having the problem with Reads, I've also isolated where the problem is but not been able to find the root cause. I have 16 machines running hbase-region server, each has ~35 regions. Once in a while cpu goes flatout 80% in 1 region server. These are the things i've noticed in ganglia: hbase.regionserver.request - evenly distributed. Not seeing any spikes on the busy server hbase.regionserver.blockCacheSize - between 500MB and 1000MB hbase.regionserver.compactionQueueSize - avg 2 or less hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC -XX:+UseConcMarkSweepGC I've noticed the system load moves to a different region, sometimes within a minute, if the busy region is restarted. Any suggestion what could be causing the load and/or what other metrics should I check ? Thank you!
Re: High cpu usage on a region server
Not that I am aware of. Reduce the HFile block size will lessen this problem (but then cause other issues). It's just a fix to the RegexStringFilter. You can just recompile that and deploy it to the RegionServers (need to make it's in the class path before the HBase jars). Probably easier to roll a new release. It's a shame we did not see this earlier. -- Lars From: OpenSource Dev dev.opensou...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Thursday, September 12, 2013 9:52 AM Subject: Re: High cpu usage on a region server Thanks Lars. Are there any other workarounds for this issue until we get the fix ? If not we might have to do the patch and rollout custom pkg. On Thu, Sep 12, 2013 at 8:36 AM, lars hofhansl la...@apache.org wrote: Yep... Very likely HBASE-9428: 8 threads: java.lang.Thread.State: RUNNABLE at java.util.Arrays.copyOf(Arrays.java:2786) at java.lang.StringCoding.decode(StringCoding.java:178) at java.lang.String.init(String.java:483) at org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96) ... 4 threads: java.lang.Thread.State: RUNNABLE at sun.nio.cs.ISO_8859_1$Decoder.decodeArrayLoop(ISO_8859_1.java:79) at sun.nio.cs.ISO_8859_1$Decoder.decodeLoop(ISO_8859_1.java:106) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:544) at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:140) at java.lang.StringCoding.decode(StringCoding.java:179) at java.lang.String.init(String.java:483) at org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96) It's also consistent with what you see: Lots of garbage (hence tweaking your GC options had a significant effect) The fix is in 0.94.12, which is in RC right now, probably to be released early next week. -- Lars From: OpenSource Dev dev.opensou...@gmail.com To: user@hbase.apache.org Sent: Thursday, September 12, 2013 8:15 AM Subject: Re: High cpu usage on a region server A server started getting busy last night, but this time it took ~5 hrs to get from 15% busy to 75% busy. It is not running 80% flat-out yet. But this is still very high compared to other servers that are running under ~25% cpu usage. Only change that I made yesterday was the addition of -XX:+UseParNewGC to hbase startup command. http://pastebin.com/VRmujgyH On Wed, Sep 11, 2013 at 2:28 PM, Stack st...@duboce.net wrote: Can you thread dump the busy server and pastebin it? Thanks, St.Ack On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev dev.opensou...@gmail.comwrote: Hi, I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no issues with writes/puts. System is handles upto 800k puts per seconds without issue. On average we do 250k puts per second. I am having the problem with Reads, I've also isolated where the problem is but not been able to find the root cause. I have 16 machines running hbase-region server, each has ~35 regions. Once in a while cpu goes flatout 80% in 1 region server. These are the things i've noticed in ganglia: hbase.regionserver.request - evenly distributed. Not seeing any spikes on the busy server hbase.regionserver.blockCacheSize - between 500MB and 1000MB hbase.regionserver.compactionQueueSize - avg 2 or less hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC -XX:+UseConcMarkSweepGC I've noticed the system load moves to a different region, sometimes within a minute, if the busy region is restarted. Any suggestion what could be causing the load and/or what other metrics should I check ? Thank you!
Re: High cpu usage on a region server
Thanks Lars. Are there any other workarounds for this issue until we get the fix ? If not we might have to do the patch and rollout custom pkg. On Thu, Sep 12, 2013 at 8:36 AM, lars hofhansl la...@apache.org wrote: Yep... Very likely HBASE-9428: 8 threads: java.lang.Thread.State: RUNNABLE at java.util.Arrays.copyOf(Arrays.java:2786) at java.lang.StringCoding.decode(StringCoding.java:178) at java.lang.String.init(String.java:483) at org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96) ... 4 threads: java.lang.Thread.State: RUNNABLE at sun.nio.cs.ISO_8859_1$Decoder.decodeArrayLoop(ISO_8859_1.java:79) at sun.nio.cs.ISO_8859_1$Decoder.decodeLoop(ISO_8859_1.java:106) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:544) at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:140) at java.lang.StringCoding.decode(StringCoding.java:179) at java.lang.String.init(String.java:483) at org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96) It's also consistent with what you see: Lots of garbage (hence tweaking your GC options had a significant effect) The fix is in 0.94.12, which is in RC right now, probably to be released early next week. -- Lars From: OpenSource Dev dev.opensou...@gmail.com To: user@hbase.apache.org Sent: Thursday, September 12, 2013 8:15 AM Subject: Re: High cpu usage on a region server A server started getting busy last night, but this time it took ~5 hrs to get from 15% busy to 75% busy. It is not running 80% flat-out yet. But this is still very high compared to other servers that are running under ~25% cpu usage. Only change that I made yesterday was the addition of -XX:+UseParNewGC to hbase startup command. http://pastebin.com/VRmujgyH On Wed, Sep 11, 2013 at 2:28 PM, Stack st...@duboce.net wrote: Can you thread dump the busy server and pastebin it? Thanks, St.Ack On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev dev.opensou...@gmail.comwrote: Hi, I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no issues with writes/puts. System is handles upto 800k puts per seconds without issue. On average we do 250k puts per second. I am having the problem with Reads, I've also isolated where the problem is but not been able to find the root cause. I have 16 machines running hbase-region server, each has ~35 regions. Once in a while cpu goes flatout 80% in 1 region server. These are the things i've noticed in ganglia: hbase.regionserver.request - evenly distributed. Not seeing any spikes on the busy server hbase.regionserver.blockCacheSize - between 500MB and 1000MB hbase.regionserver.compactionQueueSize - avg 2 or less hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC -XX:+UseConcMarkSweepGC I've noticed the system load moves to a different region, sometimes within a minute, if the busy region is restarted. Any suggestion what could be causing the load and/or what other metrics should I check ? Thank you!
Re: High cpu usage on a region server
A server started getting busy last night, but this time it took ~5 hrs to get from 15% busy to 75% busy. It is not running 80% flat-out yet. But this is still very high compared to other servers that are running under ~25% cpu usage. Only change that I made yesterday was the addition of -XX:+UseParNewGC to hbase startup command. http://pastebin.com/VRmujgyH On Wed, Sep 11, 2013 at 2:28 PM, Stack st...@duboce.net wrote: Can you thread dump the busy server and pastebin it? Thanks, St.Ack On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev dev.opensou...@gmail.comwrote: Hi, I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no issues with writes/puts. System is handles upto 800k puts per seconds without issue. On average we do 250k puts per second. I am having the problem with Reads, I've also isolated where the problem is but not been able to find the root cause. I have 16 machines running hbase-region server, each has ~35 regions. Once in a while cpu goes flatout 80% in 1 region server. These are the things i've noticed in ganglia: hbase.regionserver.request - evenly distributed. Not seeing any spikes on the busy server hbase.regionserver.blockCacheSize - between 500MB and 1000MB hbase.regionserver.compactionQueueSize - avg 2 or less hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC -XX:+UseConcMarkSweepGC I've noticed the system load moves to a different region, sometimes within a minute, if the busy region is restarted. Any suggestion what could be causing the load and/or what other metrics should I check ? Thank you!
Re: High cpu usage on a region server
You might have run into HBASE-9428 -- Lars From: OpenSource Dev dev.opensou...@gmail.com To: user@hbase.apache.org Sent: Wednesday, September 11, 2013 1:49 PM Subject: High cpu usage on a region server Hi, I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no issues with writes/puts. System is handles upto 800k puts per seconds without issue. On average we do 250k puts per second. I am having the problem with Reads, I've also isolated where the problem is but not been able to find the root cause. I have 16 machines running hbase-region server, each has ~35 regions. Once in a while cpu goes flatout 80% in 1 region server. These are the things i've noticed in ganglia: hbase.regionserver.request - evenly distributed. Not seeing any spikes on the busy server hbase.regionserver.blockCacheSize - between 500MB and 1000MB hbase.regionserver.compactionQueueSize - avg 2 or less hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC -XX:+UseConcMarkSweepGC I've noticed the system load moves to a different region, sometimes within a minute, if the busy region is restarted. Any suggestion what could be causing the load and/or what other metrics should I check ? Thank you!
Re: High cpu usage on a region server
Have you turned on short-circuit read ? Cheers On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev dev.opensou...@gmail.comwrote: Hi, I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no issues with writes/puts. System is handles upto 800k puts per seconds without issue. On average we do 250k puts per second. I am having the problem with Reads, I've also isolated where the problem is but not been able to find the root cause. I have 16 machines running hbase-region server, each has ~35 regions. Once in a while cpu goes flatout 80% in 1 region server. These are the things i've noticed in ganglia: hbase.regionserver.request - evenly distributed. Not seeing any spikes on the busy server hbase.regionserver.blockCacheSize - between 500MB and 1000MB hbase.regionserver.compactionQueueSize - avg 2 or less hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC -XX:+UseConcMarkSweepGC I've noticed the system load moves to a different region, sometimes within a minute, if the busy region is restarted. Any suggestion what could be causing the load and/or what other metrics should I check ? Thank you!
Re: High cpu usage on a region server
Can you thread dump the busy server and pastebin it? Thanks, St.Ack On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev dev.opensou...@gmail.comwrote: Hi, I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no issues with writes/puts. System is handles upto 800k puts per seconds without issue. On average we do 250k puts per second. I am having the problem with Reads, I've also isolated where the problem is but not been able to find the root cause. I have 16 machines running hbase-region server, each has ~35 regions. Once in a while cpu goes flatout 80% in 1 region server. These are the things i've noticed in ganglia: hbase.regionserver.request - evenly distributed. Not seeing any spikes on the busy server hbase.regionserver.blockCacheSize - between 500MB and 1000MB hbase.regionserver.compactionQueueSize - avg 2 or less hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC -XX:+UseConcMarkSweepGC I've noticed the system load moves to a different region, sometimes within a minute, if the busy region is restarted. Any suggestion what could be causing the load and/or what other metrics should I check ? Thank you!
Re: High cpu usage on a region server
No, dfs.client.read.shortcircuit is set to false by default in our cluster. Looks like this is a good performance improvement parameter, are there any side effects of turning it on ? Thx On Wed, Sep 11, 2013 at 1:57 PM, Ted Yu yuzhih...@gmail.com wrote: Have you turned on short-circuit read ? Cheers On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev dev.opensou...@gmail.comwrote: Hi, I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no issues with writes/puts. System is handles upto 800k puts per seconds without issue. On average we do 250k puts per second. I am having the problem with Reads, I've also isolated where the problem is but not been able to find the root cause. I have 16 machines running hbase-region server, each has ~35 regions. Once in a while cpu goes flatout 80% in 1 region server. These are the things i've noticed in ganglia: hbase.regionserver.request - evenly distributed. Not seeing any spikes on the busy server hbase.regionserver.blockCacheSize - between 500MB and 1000MB hbase.regionserver.compactionQueueSize - avg 2 or less hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC -XX:+UseConcMarkSweepGC I've noticed the system load moves to a different region, sometimes within a minute, if the busy region is restarted. Any suggestion what could be causing the load and/or what other metrics should I check ? Thank you!
High cpu usage on a region server
Hi, I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no issues with writes/puts. System is handles upto 800k puts per seconds without issue. On average we do 250k puts per second. I am having the problem with Reads, I've also isolated where the problem is but not been able to find the root cause. I have 16 machines running hbase-region server, each has ~35 regions. Once in a while cpu goes flatout 80% in 1 region server. These are the things i've noticed in ganglia: hbase.regionserver.request - evenly distributed. Not seeing any spikes on the busy server hbase.regionserver.blockCacheSize - between 500MB and 1000MB hbase.regionserver.compactionQueueSize - avg 2 or less hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC -XX:+UseConcMarkSweepGC I've noticed the system load moves to a different region, sometimes within a minute, if the busy region is restarted. Any suggestion what could be causing the load and/or what other metrics should I check ? Thank you!
Re: High cpu usage on a region server
Load has not gone up since last 5 hrs :) Will get the dump if it goes up again. thx On Wed, Sep 11, 2013 at 2:28 PM, Stack st...@duboce.net wrote: Can you thread dump the busy server and pastebin it? Thanks, St.Ack On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev dev.opensou...@gmail.comwrote: Hi, I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no issues with writes/puts. System is handles upto 800k puts per seconds without issue. On average we do 250k puts per second. I am having the problem with Reads, I've also isolated where the problem is but not been able to find the root cause. I have 16 machines running hbase-region server, each has ~35 regions. Once in a while cpu goes flatout 80% in 1 region server. These are the things i've noticed in ganglia: hbase.regionserver.request - evenly distributed. Not seeing any spikes on the busy server hbase.regionserver.blockCacheSize - between 500MB and 1000MB hbase.regionserver.compactionQueueSize - avg 2 or less hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC -XX:+UseConcMarkSweepGC I've noticed the system load moves to a different region, sometimes within a minute, if the busy region is restarted. Any suggestion what could be causing the load and/or what other metrics should I check ? Thank you!
Re: High cpu usage on a region server
Hi Lars, All the read write requests are equally distributed across all region-servers. If it is caused by the HBASE-9428 bug, any idea why it would impact only 1 reason server at a given time ? Thx On Wed, Sep 11, 2013 at 1:55 PM, lars hofhansl la...@apache.org wrote: You might have run into HBASE-9428 -- Lars From: OpenSource Dev dev.opensou...@gmail.com To: user@hbase.apache.org Sent: Wednesday, September 11, 2013 1:49 PM Subject: High cpu usage on a region server Hi, I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no issues with writes/puts. System is handles upto 800k puts per seconds without issue. On average we do 250k puts per second. I am having the problem with Reads, I've also isolated where the problem is but not been able to find the root cause. I have 16 machines running hbase-region server, each has ~35 regions. Once in a while cpu goes flatout 80% in 1 region server. These are the things i've noticed in ganglia: hbase.regionserver.request - evenly distributed. Not seeing any spikes on the busy server hbase.regionserver.blockCacheSize - between 500MB and 1000MB hbase.regionserver.compactionQueueSize - avg 2 or less hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC -XX:+UseConcMarkSweepGC I've noticed the system load moves to a different region, sometimes within a minute, if the busy region is restarted. Any suggestion what could be causing the load and/or what other metrics should I check ? Thank you!
Re: High cpu usage on a region server
It might be a larger scan (maybe gathering many data points for a metric) hitting many regions, in that case you'd see only a single region server being busy at a given time, since HBase scans only a region at a time for a single client scan. A thread dump would give us a better idea. J-D specifically mentions OpenTSDB in that jira. -- Lars From: OpenSource Dev dev.opensou...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Wednesday, September 11, 2013 8:59 PM Subject: Re: High cpu usage on a region server Hi Lars, All the read write requests are equally distributed across all region-servers. If it is caused by the HBASE-9428 bug, any idea why it would impact only 1 reason server at a given time ? Thx On Wed, Sep 11, 2013 at 1:55 PM, lars hofhansl la...@apache.org wrote: You might have run into HBASE-9428 -- Lars From: OpenSource Dev dev.opensou...@gmail.com To: user@hbase.apache.org Sent: Wednesday, September 11, 2013 1:49 PM Subject: High cpu usage on a region server Hi, I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no issues with writes/puts. System is handles upto 800k puts per seconds without issue. On average we do 250k puts per second. I am having the problem with Reads, I've also isolated where the problem is but not been able to find the root cause. I have 16 machines running hbase-region server, each has ~35 regions. Once in a while cpu goes flatout 80% in 1 region server. These are the things i've noticed in ganglia: hbase.regionserver.request - evenly distributed. Not seeing any spikes on the busy server hbase.regionserver.blockCacheSize - between 500MB and 1000MB hbase.regionserver.compactionQueueSize - avg 2 or less hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other nodes JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC -XX:+UseConcMarkSweepGC I've noticed the system load moves to a different region, sometimes within a minute, if the busy region is restarted. Any suggestion what could be causing the load and/or what other metrics should I check ? Thank you!