if OPERATIONTIMEOUT, * case OPERATIONTIMEOUT:* * retryOrThrow(retryCounter, e, "getData");* * break;* it will break out the while(true) loop.
We are using hbase-0.94 , and the hbase does manage zookeeper ensemble. On Wed, Dec 19, 2012 at 11:39 AM, Ted Yu <[email protected]> wrote: > Could it be due to OPERATIONTIMEOUT ? > What version of HBase are you using ? > Do you let HBase manage zookeeper ensemble ? > > Cheers > > On Tue, Dec 18, 2012 at 7:19 PM, 唐 颖 <[email protected]> wrote: > > > We have a muith-thread program to put data into base . Each thread news > an > > instance of a HTable ,because they put data into different HTable. > > > > But today we find that this program is stucked. After we stack this java > > process,we found that one thread is stucked in > > > > "pool-1-thread-9" prio=10 tid=0x00007fbb14036800 nid=0x4f7a waiting on > > condition [0x00007fbb5d411000] > > java.lang.Thread.State: TIMED_WAITING (sleeping) > > at java.lang.Thread.sleep(Native Method) > > at java.lang.Thread.sleep(Thread.java:302) > > at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328) > > at > > > org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:54) > > at > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:277) > > at > > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:522) > > at > > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:498) > > at > > > org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.getData(ZooKeeperNodeTracker.java:156) > > - locked <0x000000067bc07738> (a > > org.apache.hadoop.hbase.zookeeper.RootRegionTracker) > > at > > > org.apache.hadoop.hbase.zookeeper.RootRegionTracker.getRootRegionLocation(RootRegionTracker.java:62) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:821) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:933) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:832) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:933) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) > > at > > org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:238) > > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:178) > > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:137) > > at com.xingcloud.server.task.EventTask.run(EventTask.java:65) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > > > > > And other threads are waiting this lock. > > > > "pool-1-thread-7" prio=10 tid=0x00007fbb14032800 nid=0x4f76 waiting for > > monitor entry [0x00007fbb5d493000] > > java.lang.Thread.State: BLOCKED (on object monitor) > > at > > > org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.getData(ZooKeeperNodeTracker.java:154) > > - waiting to lock <0x000000067bc07738> (a > > org.apache.hadoop.hbase.zookeeper.RootRegionTracker) > > at > > > org.apache.hadoop.hbase.zookeeper.RootRegionTracker.getRootRegionLocation(RootRegionTracker.java:62) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:821) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:933) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:832) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:933) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) > > at > > org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:238) > > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:178) > > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:137) > > at com.xingcloud.server.task.EventTask.run(EventTask.java:65) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > > > > > > > I checked the base code of > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:277) > > > > public byte[] getData(String path, Watcher watcher, Stat stat) > > throws KeeperException, InterruptedException { > > RetryCounter retryCounter = retryCounterFactory.create(); > > while (true) { > > try { > > byte[] revData = zk.getData(path, watcher, stat); > > return this.removeMetaData(revData); > > } catch (KeeperException e) { > > switch (e.code()) { > > case CONNECTIONLOSS: > > case OPERATIONTIMEOUT: > > retryOrThrow(retryCounter, e, "getData"); > > break; > > > > default: > > throw e; > > } > > } > > retryCounter.sleepUntilNextRetry(); > > retryCounter.useRetry(); > > } > > } > > > > I guess the KeeperException.code is CONNECTIONLOSS , this error code > > causes this stucked thing happened. > > > > Why this error code is CONNECTIONLOSS? > > > > And i restart this client program ,this situation still happens. To solve > > this, must i restart HBase? > > > > > > Thanks! > > > > > > > > > -- Best regards, Ivy Tang
