[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791630#comment-14791630 ] Hudson commented on HBASE-14274: FAILURE: Integrated in HBase-TRUNK #6817 (See [https://builds.apache.org/job/HBase-TRUNK/6817/]) HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev c1ac4bb8601f88eb3fe246eb62c3f40e95faf93d) * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java > Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs > MetricsRegionAggregateSourceImpl > --- > > Key: HBASE-14274 > URL: https://issues.apache.org/jira/browse/HBASE-14274 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, > HBASE-14274.patch > > > Looking into parent issue, got a hang locally of TestDistributedLogReplay. > We have region closes here: > {code} > "RS_CLOSE_META-localhost:59610-0" prio=5 tid=0x7ff65c03f800 nid=0x54347 > waiting on condition [0x00011f7ac000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00075636d8c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) > at > org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) > at > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) > - locked <0x0007ff878190> (a java.lang.Object) > at > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {code} > They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to > get a write lock on this classes local ReentrantReadWriteLock while holding > MetricsRegionSourceImpl's readWriteLock write lock. > Then, elsewhere the JmxCacheBuster is running trying to get metrics with > above locks held in reverse: > {code} > "HBase-Metrics2-1" daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting > on condition [0x000140ea5000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007cade1480> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) > at >
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791449#comment-14791449 ] Hudson commented on HBASE-14274: SUCCESS: Integrated in HBase-1.3-IT #162 (See [https://builds.apache.org/job/HBase-1.3-IT/162/]) HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev 2029e851827fa1bf59436c7baa1971b52ac5833e) * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java > Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs > MetricsRegionAggregateSourceImpl > --- > > Key: HBASE-14274 > URL: https://issues.apache.org/jira/browse/HBASE-14274 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, > HBASE-14274.patch > > > Looking into parent issue, got a hang locally of TestDistributedLogReplay. > We have region closes here: > {code} > "RS_CLOSE_META-localhost:59610-0" prio=5 tid=0x7ff65c03f800 nid=0x54347 > waiting on condition [0x00011f7ac000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00075636d8c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) > at > org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) > at > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) > - locked <0x0007ff878190> (a java.lang.Object) > at > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {code} > They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to > get a write lock on this classes local ReentrantReadWriteLock while holding > MetricsRegionSourceImpl's readWriteLock write lock. > Then, elsewhere the JmxCacheBuster is running trying to get metrics with > above locks held in reverse: > {code} > "HBase-Metrics2-1" daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting > on condition [0x000140ea5000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007cade1480> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) > at >
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791454#comment-14791454 ] Hudson commented on HBASE-14274: FAILURE: Integrated in HBase-1.2-IT #152 (See [https://builds.apache.org/job/HBase-1.2-IT/152/]) HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev a229ac91fbab2608ae89bbe44b1dd05e5aef1183) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java > Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs > MetricsRegionAggregateSourceImpl > --- > > Key: HBASE-14274 > URL: https://issues.apache.org/jira/browse/HBASE-14274 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, > HBASE-14274.patch > > > Looking into parent issue, got a hang locally of TestDistributedLogReplay. > We have region closes here: > {code} > "RS_CLOSE_META-localhost:59610-0" prio=5 tid=0x7ff65c03f800 nid=0x54347 > waiting on condition [0x00011f7ac000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00075636d8c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) > at > org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) > at > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) > - locked <0x0007ff878190> (a java.lang.Object) > at > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {code} > They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to > get a write lock on this classes local ReentrantReadWriteLock while holding > MetricsRegionSourceImpl's readWriteLock write lock. > Then, elsewhere the JmxCacheBuster is running trying to get metrics with > above locks held in reverse: > {code} > "HBase-Metrics2-1" daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting > on condition [0x000140ea5000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007cade1480> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) > at >
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791556#comment-14791556 ] Hudson commented on HBASE-14274: FAILURE: Integrated in HBase-1.3 #182 (See [https://builds.apache.org/job/HBase-1.3/182/]) HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev 2029e851827fa1bf59436c7baa1971b52ac5833e) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java > Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs > MetricsRegionAggregateSourceImpl > --- > > Key: HBASE-14274 > URL: https://issues.apache.org/jira/browse/HBASE-14274 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, > HBASE-14274.patch > > > Looking into parent issue, got a hang locally of TestDistributedLogReplay. > We have region closes here: > {code} > "RS_CLOSE_META-localhost:59610-0" prio=5 tid=0x7ff65c03f800 nid=0x54347 > waiting on condition [0x00011f7ac000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00075636d8c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) > at > org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) > at > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) > - locked <0x0007ff878190> (a java.lang.Object) > at > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {code} > They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to > get a write lock on this classes local ReentrantReadWriteLock while holding > MetricsRegionSourceImpl's readWriteLock write lock. > Then, elsewhere the JmxCacheBuster is running trying to get metrics with > above locks held in reverse: > {code} > "HBase-Metrics2-1" daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting > on condition [0x000140ea5000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007cade1480> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) > at >
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791563#comment-14791563 ] Hudson commented on HBASE-14274: FAILURE: Integrated in HBase-1.2 #180 (See [https://builds.apache.org/job/HBase-1.2/180/]) HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev a229ac91fbab2608ae89bbe44b1dd05e5aef1183) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java > Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs > MetricsRegionAggregateSourceImpl > --- > > Key: HBASE-14274 > URL: https://issues.apache.org/jira/browse/HBASE-14274 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, > HBASE-14274.patch > > > Looking into parent issue, got a hang locally of TestDistributedLogReplay. > We have region closes here: > {code} > "RS_CLOSE_META-localhost:59610-0" prio=5 tid=0x7ff65c03f800 nid=0x54347 > waiting on condition [0x00011f7ac000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00075636d8c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) > at > org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) > at > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) > - locked <0x0007ff878190> (a java.lang.Object) > at > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {code} > They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to > get a write lock on this classes local ReentrantReadWriteLock while holding > MetricsRegionSourceImpl's readWriteLock write lock. > Then, elsewhere the JmxCacheBuster is running trying to get metrics with > above locks held in reverse: > {code} > "HBase-Metrics2-1" daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting > on condition [0x000140ea5000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007cade1480> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) > at >
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707825#comment-14707825 ] Hudson commented on HBASE-14274: FAILURE: Integrated in HBase-TRUNK #6746 (See [https://builds.apache.org/job/HBase-TRUNK/6746/]) HBASE-14274 Addendum sets closed to true when closing (tedyu: rev 9b2325e16fac7f2f37ac3539aee4faf6cdb8d6a6) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707804#comment-14707804 ] Hudson commented on HBASE-14274: FAILURE: Integrated in HBase-1.2 #130 (See [https://builds.apache.org/job/HBase-1.2/130/]) HBASE-14274 Addendum sets closed to true when closing (tedyu: rev 1484aecc2635fdaecfeeeb368eafa2204041a8a9) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706539#comment-14706539 ] Hudson commented on HBASE-14274: FAILURE: Integrated in HBase-TRUNK #6744 (See [https://builds.apache.org/job/HBase-TRUNK/6744/]) HBASE-14274 Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl (stack: rev bcef28eefaf192b0ad48c8011f98b8e944340da5) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 23612.stack, HBASE-14274-v1.patch, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707458#comment-14707458 ] Hudson commented on HBASE-14274: SUCCESS: Integrated in HBase-1.2-IT #106 (See [https://builds.apache.org/job/HBase-1.2-IT/106/]) HBASE-14274 Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl (busbey: rev 909e2fe504169f978989e4c1c3778291b3da2418) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707575#comment-14707575 ] Elliott Clark commented on HBASE-14274: --- Yep [~tedyu] that's what I meant. Want to commit that to every branch ? Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707613#comment-14707613 ] Ted Yu commented on HBASE-14274: Pushed addendum to the 3 branches. Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707753#comment-14707753 ] Hudson commented on HBASE-14274: FAILURE: Integrated in HBase-1.3 #126 (See [https://builds.apache.org/job/HBase-1.3/126/]) HBASE-14274 Addendum sets closed to true when closing (tedyu: rev f05770b96f6246ffe851e6878dc07c4b7c02906f) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707756#comment-14707756 ] Hudson commented on HBASE-14274: SUCCESS: Integrated in HBase-1.3-IT #109 (See [https://builds.apache.org/job/HBase-1.3-IT/109/]) HBASE-14274 Addendum sets closed to true when closing (tedyu: rev f05770b96f6246ffe851e6878dc07c4b7c02906f) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707743#comment-14707743 ] Hudson commented on HBASE-14274: SUCCESS: Integrated in HBase-1.2-IT #107 (See [https://builds.apache.org/job/HBase-1.2-IT/107/]) HBASE-14274 Addendum sets closed to true when closing (tedyu: rev 1484aecc2635fdaecfeeeb368eafa2204041a8a9) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706275#comment-14706275 ] Sean Busbey commented on HBASE-14274: - cherry-picked to 1.2 Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 23612.stack, HBASE-14274-v1.patch, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706358#comment-14706358 ] Hudson commented on HBASE-14274: FAILURE: Integrated in HBase-1.3-IT #108 (See [https://builds.apache.org/job/HBase-1.3-IT/108/]) HBASE-14274 Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl (stack: rev f4ad31c8f91782e51e0bffdcd11c587a6c1dd3e9) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 23612.stack, HBASE-14274-v1.patch, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706210#comment-14706210 ] stack commented on HBASE-14274: --- [~busbey] Looks like this is needed for 1.2 too. I applied to branch-1. Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Assignee: Elliott Clark Fix For: 2.0.0, 1.3.0 Attachments: 23612.stack, HBASE-14274-v1.patch, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706009#comment-14706009 ] Elliott Clark commented on HBASE-14274: --- Ok so I can't get that log message to go away without lots of interesting work other places. so lets get this in and we can work on the log message more in a follow on jira. Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Attachments: 23612.stack, HBASE-14274-v1.patch, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705929#comment-14705929 ] Elliott Clark commented on HBASE-14274: --- For that one it seems like the cache buster ran before the nn was all set up. I don't think that it's caused by the patch, more likely caused the fact that HBase has to stop and start the metrics system. # That shouldn't be an issue in real life. # Let me get something up so that the logs are cleaner. Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Attachments: 23612.stack, HBASE-14274-v1.patch, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705776#comment-14705776 ] Elliott Clark commented on HBASE-14274: --- We can get away without the clearJmxCache however I think that we need to have agg.deregister above all the removeMetric things or we risk a race. Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Attachments: 23612.stack, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705723#comment-14705723 ] stack commented on HBASE-14274: --- I was going to ask if CHM would do. What about MetricsRegionSourceImpl#close? It calls add.deregister which will run the cache buster... then still inside the lock, we'll again call clearJmxCache. Move the add.deregister in place of the call to clearJmxCache? How we know this stuff is doing the metrics clearing you want [~eclark]? Thanks. Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Attachments: 23612.stack, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705727#comment-14705727 ] stack commented on HBASE-14274: --- Was thinking of doing this: {code} diff --git a/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java b/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java index 7290c56..fab6861 100644 --- a/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java +++ b/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java @@ -117,7 +117,6 @@ public class MetricsRegionSourceImpl implements MetricsRegionSource { } closed = true; - agg.deregister(this); if (LOG.isTraceEnabled()) { LOG.trace(Removing region Metrics: + regionWrapper.getRegionName()); @@ -131,10 +130,8 @@ public class MetricsRegionSourceImpl implements MetricsRegionSource { registry.removeMetric(regionScanNextKey); registry.removeHistogramMetrics(regionGetKey); registry.removeHistogramMetrics(regionScanNextKey); - regionWrapper = null; - - JmxCacheBuster.clearJmxCache(); + agg.deregister(this); } finally { lock.unlock(); } {code} ... but not sure how registry places with aggregating bean. We need this extra run of clearJmxCache in here? Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Attachments: 23612.stack, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705781#comment-14705781 ] stack commented on HBASE-14274: --- It will run the cachebuster... It could run before stuff is removed here in MetricsRegionSourceImpl? Would that leave dangling metrics? [~eclark] Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Attachments: 23612.stack, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705672#comment-14705672 ] stack commented on HBASE-14274: --- Hmm... not enough. I see the cache buster runs every 5 mins anyways, not just on region close. Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Attachments: 23612.stack Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705671#comment-14705671 ] stack commented on HBASE-14274: --- Thanks [~apurtell] [~eclark] How about this? diff --git a/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java b/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java index 7290c56..87272c4 100644 --- a/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java +++ b/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java @@ -133,11 +133,10 @@ public class MetricsRegionSourceImpl implements MetricsRegionSource { registry.removeHistogramMetrics(regionScanNextKey); regionWrapper = null; - - JmxCacheBuster.clearJmxCache(); } finally { lock.unlock(); } +JmxCacheBuster.clearJmxCache(); } We don't need the cache buster to run under the lock, right? And we can be stale for a few millis? Let me try and make a test. Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Attachments: 23612.stack Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705605#comment-14705605 ] Andrew Purtell commented on HBASE-14274: JMX cache buster came in on HBASE-14166 Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Attachments: 23612.stack Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705687#comment-14705687 ] Elliott Clark commented on HBASE-14274: --- Yeah cache buster runs in the background, so it will be better to put it out side of the lock but it shouldn't change much. Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Attachments: 23612.stack Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705701#comment-14705701 ] Elliott Clark commented on HBASE-14274: --- I can just remove the lock in metrics region source impl. Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Attachments: 23612.stack Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705898#comment-14705898 ] stack commented on HBASE-14274: --- +1 Shouldn't deadlock anymore given you've removed both locks (smile). I tried it local and it is good. I was able to deadlock pretty easily locally but not w/ this patch applied. Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Attachments: 23612.stack, HBASE-14274-v1.patch, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00075636d8c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) - locked 0x0007ff878190 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to get a write lock on this classes local ReentrantReadWriteLock while holding MetricsRegionSourceImpl's readWriteLock write lock. Then, elsewhere the JmxCacheBuster is running trying to get metrics with above locks held in reverse: {code} HBase-Metrics2-1 daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting on condition [0x000140ea5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cade1480 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.getMetrics(MetricsRegionAggregateSourceImpl.java:115) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) at
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705903#comment-14705903 ] stack commented on HBASE-14274: --- This related to your change? Should protect against it? {code} 119113 2015-08-20 15:31:10,704 WARN [HBase-Metrics2-1] impl.MetricsConfig(124): Cannot locate configuration: tried hadoop-metrics2-hbase.properties,hadoop-metrics2.properties 119114 2015-08-20 15:31:10,710 ERROR [HBase-Metrics2-1] lib.MethodMetric$2(118): Error invoking method getBlocksTotal 119115 java.lang.reflect.InvocationTargetException 119116 › at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) 119117 › at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 119118 › at java.lang.reflect.Method.invoke(Method.java:606) 119119 › at org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111) 119120 › at org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144) 119121 › at org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387) 119122 › at org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79) 119123 › at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) 119124 › at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) 119125 › at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) 119126 › at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) 119127 › at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) 119128 › at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) 119129 › at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57) 119130 › at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:221) 119131 › at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:96) 119132 › at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:245) 119133 › at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:229) 119134 › at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source) 119135 › at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 119136 › at java.lang.reflect.Method.invoke(Method.java:606) 119137 › at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290) 119138 › at com.sun.proxy.$Proxy13.postStart(Unknown Source) 119139 › at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185) 119140 › at org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:81) 119141 › at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 119142 › at java.util.concurrent.FutureTask.run(FutureTask.java:262) 119143 › at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) 119144 › at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) 119145 › at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 119146 › at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 119147 › at java.lang.Thread.run(Thread.java:744) 119148 Caused by: java.lang.NullPointerException 119149 › at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.size(BlocksMap.java:198) 119150 › at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.getTotalBlocks(BlockManager.java:3158) 119151 › at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlocksTotal(FSNamesystem.java:5652) 119152 › ... 32 more {code} Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl --- Key: HBASE-14274 URL: https://issues.apache.org/jira/browse/HBASE-14274 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Attachments: 23612.stack, HBASE-14274-v1.patch, HBASE-14274.patch Looking into parent issue, got a hang locally of TestDistributedLogReplay. We have region closes here: {code} RS_CLOSE_META-localhost:59610-0 prio=5 tid=0x7ff65c03f800 nid=0x54347 waiting on condition [0x00011f7ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for