if OPERATIONTIMEOUT,
*  case OPERATIONTIMEOUT:*
*            retryOrThrow(retryCounter, e, "getData");*
*            break;*
it will break out the while(true) loop.

We are using hbase-0.94 , and the hbase does manage zookeeper ensemble.


On Wed, Dec 19, 2012 at 11:39 AM, Ted Yu <[email protected]> wrote:

> Could it be due to OPERATIONTIMEOUT ?
> What version of HBase are you using ?
> Do you let HBase manage zookeeper ensemble ?
>
> Cheers
>
> On Tue, Dec 18, 2012 at 7:19 PM, 唐 颖 <[email protected]> wrote:
>
> > We have a muith-thread program to put data into base . Each thread news
> an
> > instance of a HTable ,because they put data into different HTable.
> >
> > But today we find that this program is stucked. After we stack this java
> > process,we found that one thread is stucked in
> >
> > "pool-1-thread-9" prio=10 tid=0x00007fbb14036800 nid=0x4f7a waiting on
> > condition [0x00007fbb5d411000]
> >    java.lang.Thread.State: TIMED_WAITING (sleeping)
> >         at java.lang.Thread.sleep(Native Method)
> >         at java.lang.Thread.sleep(Thread.java:302)
> >         at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328)
> >         at
> >
> org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:54)
> >         at
> >
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:277)
> >         at
> > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:522)
> >         at
> > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:498)
> >         at
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.getData(ZooKeeperNodeTracker.java:156)
> >         - locked <0x000000067bc07738> (a
> > org.apache.hadoop.hbase.zookeeper.RootRegionTracker)
> >         at
> >
> org.apache.hadoop.hbase.zookeeper.RootRegionTracker.getRootRegionLocation(RootRegionTracker.java:62)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:821)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:933)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:832)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:933)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
> >         at
> > org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:238)
> >         at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:178)
> >         at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:137)
> >         at com.xingcloud.server.task.EventTask.run(EventTask.java:65)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >         at java.lang.Thread.run(Thread.java:662)
> >
> >
> > And other threads are waiting this lock.
> >
> > "pool-1-thread-7" prio=10 tid=0x00007fbb14032800 nid=0x4f76 waiting for
> > monitor entry [0x00007fbb5d493000]
> >    java.lang.Thread.State: BLOCKED (on object monitor)
> >         at
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.getData(ZooKeeperNodeTracker.java:154)
> >         - waiting to lock <0x000000067bc07738> (a
> > org.apache.hadoop.hbase.zookeeper.RootRegionTracker)
> >         at
> >
> org.apache.hadoop.hbase.zookeeper.RootRegionTracker.getRootRegionLocation(RootRegionTracker.java:62)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:821)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:933)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:832)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:933)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
> >         at
> > org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:238)
> >         at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:178)
> >         at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:137)
> >         at com.xingcloud.server.task.EventTask.run(EventTask.java:65)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >         at java.lang.Thread.run(Thread.java:662)
> >
> >
> >
> > I checked the base code of
> >
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:277)
> >
> >   public byte[] getData(String path, Watcher watcher, Stat stat)
> >   throws KeeperException, InterruptedException {
> >     RetryCounter retryCounter = retryCounterFactory.create();
> >     while (true) {
> >       try {
> >         byte[] revData = zk.getData(path, watcher, stat);
> >         return this.removeMetaData(revData);
> >       } catch (KeeperException e) {
> >         switch (e.code()) {
> >           case CONNECTIONLOSS:
> >           case OPERATIONTIMEOUT:
> >             retryOrThrow(retryCounter, e, "getData");
> >             break;
> >
> >           default:
> >             throw e;
> >         }
> >       }
> >       retryCounter.sleepUntilNextRetry();
> >       retryCounter.useRetry();
> >     }
> >   }
> >
> > I guess the KeeperException.code is CONNECTIONLOSS ,  this error code
> > causes this stucked thing happened.
> >
> > Why this error code is CONNECTIONLOSS?
> >
> > And i restart this client program ,this situation still happens. To solve
> > this, must i restart HBase?
> >
> >
> > Thanks!
> >
> >
> >
> >
>



-- 
Best regards,

Ivy Tang

Reply via email to