[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2017-06-21 Thread deepankar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057283#comment-16057283
 ] 

deepankar commented on HBASE-16616:
---

Ah got it, thanks [~tomu.tsuruhara] for pointing me to that patch, that will 
definitely help

> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
>Assignee: Tomu Tsuruhara
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2017-06-20 Thread Tomu Tsuruhara (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056756#comment-16056756
 ] 

Tomu Tsuruhara commented on HBASE-16616:


[~dvdreddy] You're right. The patch here is not enough. We needed the patch in 
the HBASE-16146 to resolve the issue. 


> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
>Assignee: Tomu Tsuruhara
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2017-06-20 Thread deepankar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056554#comment-16056554
 ] 

deepankar commented on HBASE-16616:
---

Hi, we have recently encountered this issue in our production and I was going 
through the patch, I am not sure this patch solves this issue fully, the doubt 
is mainly due to the fact the Counter#destroy() will only destroy the 
ThreadLocal in the ThreadLocalMap of the thread closing the connection, while 
keeping this Counter's ThreadLocal object still alive for the rest of handler 
threads. Am I missing something in this logic ? 

cc [~stack]  [~tomu.tsuruhara]

> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
>Assignee: Tomu Tsuruhara
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from Th

[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-12-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714003#comment-15714003
 ] 

stack commented on HBASE-16616:
---

Opened HBASE-17229 to backport purge of threadlocals. Thanks.

> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
>Assignee: Tomu Tsuruhara
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-12-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713995#comment-15713995
 ] 

stack commented on HBASE-16616:
---

Thank you for the clarification [~tomu.tsuruhara] Let me then resolve the issue 
I made as invalid and open another to backport HBASE-16146 to 1.2. Thanks.

> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
>Assignee: Tomu Tsuruhara
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-12-01 Thread Tomu Tsuruhara (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713580#comment-15713580
 ] 

Tomu Tsuruhara commented on HBASE-16616:


Actually, a patch here was not enough to resolve the issue because of the same 
reason mentioned here 
https://issues.apache.org/jira/browse/HBASE-17072?focusedCommentId=15675394&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15675394

HBASE-16146 seems more essential. It removes {{ThreadLocal}} usage from 
{[Counter}} entirely.

> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
>Assignee: Tomu Tsuruhara
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
This message was sent by Atlassia

[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-12-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712989#comment-15712989
 ] 

stack commented on HBASE-16616:
---

[~mantonov] Duplicate of what issue?

This should go into 1.3 and 1.2. I'll open an issue.

> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
>Assignee: Tomu Tsuruhara
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-10-05 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549717#comment-15549717
 ] 

Mikhail Antonov commented on HBASE-16616:
-

Seems related (duplicate?) or the investigation there.

[~stack] [~ghelmling]

> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
>Assignee: Tomu Tsuruhara
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-09-13 Thread Hiroshi Ikeda (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15486513#comment-15486513
 ] 

Hiroshi Ikeda commented on HBASE-16616:
---

It might be good to change indexHolderThreadLocal to be static. It is temporary 
which CPU executes which thread, so that the index to avoid cache-line 
contention per thread will be changing, after all. LongAdder does in the 
similar manner; LongAdder uses a newly added field into Thread, sharing between 
all instances per Thread. Adding the new method requires for every counter to 
call.


> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
>Assignee: Tomu Tsuruhara
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
Thi

[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-09-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15486125#comment-15486125
 ] 

Hudson commented on HBASE-16616:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #1591 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/1591/])
HBASE-16616 Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry (Tomu 
(tedyu: rev 8855670cd701fdf9c2ab41907f9525d122608e6d)
* (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/util/Counter.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java


> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
>Assignee: Tomu Tsuruhara
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--

[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-09-12 Thread Tomu Tsuruhara (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485892#comment-15485892
 ] 

Tomu Tsuruhara commented on HBASE-16616:


Then, what's next for this issue?

> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
>Assignee: Tomu Tsuruhara
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-09-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485715#comment-15485715
 ] 

Hudson commented on HBASE-16616:


FAILURE: Integrated in Jenkins build HBase-1.4 #408 (See 
[https://builds.apache.org/job/HBase-1.4/408/])
HBASE-16616 Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry (Tomu 
(tedyu: rev 8ad14bac6728792bee2b0deab0d65c8e083f4f19)
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/util/Counter.java


> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
>Assignee: Tomu Tsuruhara
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
This message was se

[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-09-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485621#comment-15485621
 ] 

Ted Yu commented on HBASE-16616:


TestClusterId failure is tracked by HBASE-16349

> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
>Assignee: Tomu Tsuruhara
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485599#comment-15485599
 ] 

Hadoop QA commented on HBASE-16616:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 2s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
48s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} branch-1 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} branch-1 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 1s 
{color} | {color:red} hbase-server in branch-1 has 1 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s 
{color} | {color:green} branch-1 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_111 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
8s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
18m 3s {color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 48s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 21s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
31s {color} | {color:green} The patch does n

[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-09-12 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485519#comment-15485519
 ] 

Duo Zhang commented on HBASE-16616:
---

[~enis] Agree. It will be a big patch so let's finish this issue first and do 
the replacement in HBASE-7612.

> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-09-12 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485508#comment-15485508
 ] 

Enis Soztutar commented on HBASE-16616:
---

bq. I think on master we could just use LongAdder to replace the Counter since 
we are now jdk8 only?
+1. See HBASE-7612. However, we can do it in the patch for HBASE-7612. 

> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-09-12 Thread Tomu Tsuruhara (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485310#comment-15485310
 ] 

Tomu Tsuruhara commented on HBASE-16616:


Ted:
Sure, I will

> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-09-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485282#comment-15485282
 ] 

Ted Yu commented on HBASE-16616:


Tomu:
Can you attach new patch for master (see Duo's comment above) ?

> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
> Attachments: 16616.branch-1.v2.txt, HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484768#comment-15484768
 ] 

Hadoop QA commented on HBASE-16616:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 44m 31s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
58s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
42s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 38s 
{color} | {color:red} hbase-common in master has 1 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
33m 13s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 56s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 120m 33s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
41s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 219m 16s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures |
|   | hadoop.hbase.client.TestReplicasClient |
| Timed out junit tests | 
org.apache.hadoop.hbase.security.access.TestAccessController2 |
|   | org.apache.hadoop.hbase.security.access.TestNamespaceCommands |
|   | org.apache.hadoop.hbase.security.access.TestCellACLs |
|   | org.apache.hadoop.hbase.security.access.TestAccessController |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.1 Server=1.12.1 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12828050/HBASE-16616.master.002.patch
 |
| JIRA Issue | HBASE-16616 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux d1f6ca255fb7 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 
20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool 

[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484692#comment-15484692
 ] 

Hadoop QA commented on HBASE-16616:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 39m 38s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
31s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
42s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 37s 
{color} | {color:red} hbase-common in master has 1 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
5s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 39s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 52s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 52s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
30s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 184m 20s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | 
org.apache.hadoop.hbase.zookeeper.TestZooKeeperNodeTracker |
|   | org.apache.hadoop.hbase.wal.TestWALSplitCompressed |
|   | org.apache.hadoop.hbase.snapshot.TestMobSecureExportSnapshot |
|   | org.apache.hadoop.hbase.filter.TestFilterListOrOperatorWithBlkCnt |
|   | org.apache.hadoop.hbase.snapshot.TestRestoreFlushSnapshotFromClient |
|   | org.apache.hadoop.hbase.client.TestHCM |
|   | org.apache.hadoop.hbase.snapshot.TestMobFlushSnapshotFromClient |
|   | org.apache.hadoop.hbase.snapshot.TestFlushSnapshotFromClient |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12828050/HBASE-16616.master.002.patch
 |
| JIRA Issue | HBASE-16616 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux

[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-09-12 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484291#comment-15484291
 ] 

Duo Zhang commented on HBASE-16616:
---

I think on master we could just use {{LongAdder}} to replace the {{Counter}} 
since we are now jdk8 only?

+1 for branch-1 and older branches since we still need to support JDK7.

> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
> Attachments: HBASE-16616.master.001.patch, 
> HBASE-16616.master.002.patch, ScreenShot 2016-09-09 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16616) Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry

2016-09-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484210#comment-15484210
 ] 

Ted Yu commented on HBASE-16616:


+1


> Rpc handlers stuck on ThreadLocalMap.expungeStaleEntry
> --
>
> Key: HBASE-16616
> URL: https://issues.apache.org/jira/browse/HBASE-16616
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.2
>Reporter: Tomu Tsuruhara
> Attachments: HBASE-16616.master.001.patch, ScreenShot 2016-09-09 
> 14.17.53.png
>
>
> In our HBase 1.2.2 cluster, some regionserver showed too bad 
> "QueueCallTime_99th_percentile" exceeding 10 seconds.
> Most rpc handler threads stuck on ThreadLocalMap.expungeStaleEntry call at 
> that time.
> {noformat}
> "PriorityRpcServer.handler=18,queue=0,port=16020" #322 daemon prio=5 
> os_prio=0 tid=0x7fd422062800 nid=0x19b89 runnable [0x7fcb8a821000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
> at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
> at 
> java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
> at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.unlockForRegularUsage(ExponentiallyDecayingSample.java:196)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:113)
> at 
> com.yammer.metrics.stats.ExponentiallyDecayingSample.update(ExponentiallyDecayingSample.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.add(MutableHistogram.java:81)
> at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.add(MutableRangeHistogram.java:59)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.dequeuedCall(MetricsHBaseServerSourceImpl.java:194)
> at 
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.dequeuedCall(MetricsHBaseServer.java:76)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2192)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We were using jdk 1.8.0_92 and here is a snippet from ThreadLocal.java.
> {code}
> 616:while (tab[h] != null)
> 617:h = nextIndex(h, len);
> {code}
> So I hypothesized that there're too many consecutive entries in {{tab}} array 
> and actually I found them in the heapdump.
> !ScreenShot 2016-09-09 14.17.53.png|width=50%!
> Most of these entries pointed at instance of 
> {{org.apache.hadoop.hbase.util.Counter$1}}
> which is equivarent to {{indexHolderThreadLocal}} instance-variable in the 
> {{Counter}} class.
> Because {{RpcServer$Connection}} class creates a {{Counter}} instance 
> {{rpcCount}} for every connections,
> it is possible to have lots of {{Counter#indexHolderThreadLocal}} instances 
> in RegionServer process
> when we repeat connect-and-close from client. As a result, a ThreadLocalMap 
> can have lots of consecutive
> entires.
> Usually, since each entry is a {{WeakReference}}, these entries are collected 
> and removed
> by garbage-collector soon after connection closed.
> But if connection's life-time was long enough to survive youngGC, it wouldn't 
> be collected until old-gen collector runs.
> Furthermore, under G1GC deployment, it is possible not to be collected even 
> by old-gen GC(mixed GC)
> if entries sit in a region which doesn't have much garbages.
> Actually we used G1GC when we encountered this problem.
> We should remove the entry from ThreadLocalMap by calling ThreadLocal#remove 
> explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)