[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk

2011-11-11 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4724:
-

Fix Version/s: (was: 0.92.0)

This hasn't happened in a while.  Moving out of 0.92.

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)
 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
 

[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk

2011-11-05 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4724:
-

Priority: Major  (was: Critical)

Downgrading to major.  We've not seen this happen since commit.  Will leave 
open another while but no longer seems critical.

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.92.0

 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
 
 

[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk

2011-11-02 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4724:
---

Attachment: 2002_4724_TestAdmin.patch

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Priority: Critical
 Attachments: 2002_4724_TestAdmin.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)
 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
 org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:523)
 
 

[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk

2011-11-02 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4724:
---

Assignee: nkeywal
  Status: Patch Available  (was: Open)

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Attachments: 2002_4724_TestAdmin.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)
 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
 

[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk

2011-11-02 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4724:
---

Status: Open  (was: Patch Available)

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)
 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
 

[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk

2011-11-02 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4724:
---

Status: Patch Available  (was: Open)

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)
 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
 

[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk

2011-11-02 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4724:
---

Attachment: 2002_4724_TestAdmin.v2.patch

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)
 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)