[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4724: - Fix Version/s: (was: 0.92.0) This hasn't happened in a while. Moving out of 0.92. TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104) org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4724: - Priority: Major (was: Critical) Downgrading to major. We've not seen this happen since commit. Will leave open another while but no longer seems critical. TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.92.0 Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4724: --- Attachment: 2002_4724_TestAdmin.patch TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Priority: Critical Attachments: 2002_4724_TestAdmin.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104) org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277) org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:523)
[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4724: --- Assignee: nkeywal Status: Patch Available (was: Open) TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Attachments: 2002_4724_TestAdmin.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104) org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4724: --- Status: Open (was: Patch Available) TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104) org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4724: --- Status: Patch Available (was: Open) TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104) org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4724: --- Attachment: 2002_4724_TestAdmin.v2.patch TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104) org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)