Could it be due to OPERATIONTIMEOUT ? What version of HBase are you using ? Do you let HBase manage zookeeper ensemble ?
Cheers On Tue, Dec 18, 2012 at 7:19 PM, 唐 颖 <[email protected]> wrote: > We have a muith-thread program to put data into base . Each thread news an > instance of a HTable ,because they put data into different HTable. > > But today we find that this program is stucked. After we stack this java > process,we found that one thread is stucked in > > "pool-1-thread-9" prio=10 tid=0x00007fbb14036800 nid=0x4f7a waiting on > condition [0x00007fbb5d411000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at java.lang.Thread.sleep(Thread.java:302) > at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328) > at > org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:54) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:277) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:522) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:498) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.getData(ZooKeeperNodeTracker.java:156) > - locked <0x000000067bc07738> (a > org.apache.hadoop.hbase.zookeeper.RootRegionTracker) > at > org.apache.hadoop.hbase.zookeeper.RootRegionTracker.getRootRegionLocation(RootRegionTracker.java:62) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:821) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:933) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:832) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:933) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) > at > org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:238) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:178) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:137) > at com.xingcloud.server.task.EventTask.run(EventTask.java:65) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > > And other threads are waiting this lock. > > "pool-1-thread-7" prio=10 tid=0x00007fbb14032800 nid=0x4f76 waiting for > monitor entry [0x00007fbb5d493000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.getData(ZooKeeperNodeTracker.java:154) > - waiting to lock <0x000000067bc07738> (a > org.apache.hadoop.hbase.zookeeper.RootRegionTracker) > at > org.apache.hadoop.hbase.zookeeper.RootRegionTracker.getRootRegionLocation(RootRegionTracker.java:62) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:821) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:933) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:832) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:933) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) > at > org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:238) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:178) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:137) > at com.xingcloud.server.task.EventTask.run(EventTask.java:65) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > > > I checked the base code of > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:277) > > public byte[] getData(String path, Watcher watcher, Stat stat) > throws KeeperException, InterruptedException { > RetryCounter retryCounter = retryCounterFactory.create(); > while (true) { > try { > byte[] revData = zk.getData(path, watcher, stat); > return this.removeMetaData(revData); > } catch (KeeperException e) { > switch (e.code()) { > case CONNECTIONLOSS: > case OPERATIONTIMEOUT: > retryOrThrow(retryCounter, e, "getData"); > break; > > default: > throw e; > } > } > retryCounter.sleepUntilNextRetry(); > retryCounter.useRetry(); > } > } > > I guess the KeeperException.code is CONNECTIONLOSS , this error code > causes this stucked thing happened. > > Why this error code is CONNECTIONLOSS? > > And i restart this client program ,this situation still happens. To solve > this, must i restart HBase? > > > Thanks! > > > >
