Do you have NTP on your cluster - I have seen this manifest due to clock skew..
Varun On Tue, May 7, 2013 at 6:05 AM, Fabien Chung <[email protected]> wrote: > Hi all, > > i have a cluster with 8 machines (CDH4). I use an ETL (Talend) to insert > data into hbase. Mostof time that works perfectly, but sometimes rows are > not inserted, and i don't have any clue about the reason of the failure. I > have 0 errors on Talend. That usually happens when i delete the table in > hbase and i recreate a new one from Talend. > > I think these logs are revelant : > * > * > *2013-04-16 14:31:09,610 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC > Server handler 5 on 60020: exiting* > *2013-04-16 14:31:09,610 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > IPC Server Responder* > *2013-04-16 14:31:09,610 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC > Server handler 4 on 60020: exiting* > *2013-04-16 14:31:09,610 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC > Server handler 0 on 60020: exiting* > *2013-04-16 14:31:09,610 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC > Server handler 6 on 60020: exiting* > *2013-04-16 14:31:09,609 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC > Server handler 7 on 60020: exiting* > *2013-04-16 14:31:09,609 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC > Server handler 3 on 60020: exiting* > *2013-04-16 14:31:09,609 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC > Server handler 8 on 60020: exiting* > *2013-04-16 14:31:09,609 INFO > org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to > stop the worker thread* > *2013-04-16 14:31:09,610 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer* > *2013-04-16 14:31:09,610 INFO > org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker > interrupted while waiting for task, exiting: > java.lang.InterruptedException* > *2013-04-16 14:31:09,610 INFO > org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker > NODE11.ysance.local,60020,1366110719610 exiting* > *2013-04-16 14:31:09,611 INFO org.mortbay.log: Stopped > [email protected]:60030* > *2013-04-16 14:31:09,712 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: > regionserver60020.cacheFlusher exiting* > *2013-04-16 14:31:09,712 INFO > org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting.* > *2013-04-16 14:31:09,712 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: > regionserver60020.compactionChecker exiting* > *2013-04-16 14:31:09,712 INFO > org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: > Stopping RegionServerSnapshotManager gracefully.* > *2013-04-16 14:31:09,727 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down* > *2013-04-16 14:31:09,727 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x13e128af3010001 closed* > *2013-04-16 14:31:09,727 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server > NODE11.ysance.local,60020,1366110719610* > *2013-04-16 14:31:09,728 INFO > org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: > Stopping RegionServerSnapshotManager gracefully.* > *2013-04-16 14:31:09,728 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server > NODE11.ysance.local,60020,1366110719610; all regions closed.* > *2013-04-16 14:31:09,728 INFO > org.apache.hadoop.hbase.regionserver.wal.HLog: regionserver60020.logSyncer > exiting* > *2013-04-16 14:31:10,161 INFO org.apache.hadoop.hbase.regionserver.Leases: > regionserver60020 closing leases* > *2013-04-16 14:31:10,161 INFO org.apache.hadoop.hbase.regionserver.Leases: > regionserver60020 closed leases* > *2013-04-16 14:31:10,163 WARN > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient > ZooKeeper exception: > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for > /hbase/rs/NODE11.ysance.local,60020,1366110719610* > *2013-04-16 14:31:10,163 INFO org.apache.hadoop.hbase.util.RetryCounter: > Sleeping 2000ms before retry #1...* > *2013-04-16 14:31:12,163 WARN > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient > ZooKeeper exception: > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for > /hbase/rs/NODE11.ysance.local,60020,1366110719610* > *2013-04-16 14:31:12,163 INFO org.apache.hadoop.hbase.util.RetryCounter: > Sleeping 4000ms before retry #2...* > *2013-04-16 14:31:16,163 WARN > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient > ZooKeeper exception: > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for > /hbase/rs/NODE11.ysance.local,60020,1366110719610* > *2013-04-16 14:31:16,163 INFO org.apache.hadoop.hbase.util.RetryCounter: > Sleeping 8000ms before retry #3...* > *2013-04-16 14:31:19,389 INFO org.apache.hadoop.hbase.regionserver.Leases: > regionserver60020.leaseChecker closing leases* > *2013-04-16 14:31:19,390 INFO org.apache.hadoop.hbase.regionserver.Leases: > regionserver60020.leaseChecker closed leases* > *2013-04-16 14:31:24,163 WARN > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient > ZooKeeper exception: > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for > /hbase/rs/NODE11.ysance.local,60020,1366110719610* > *2013-04-16 14:31:24,163 ERROR > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper delete > failed after 3 retries* > *2013-04-16 14:31:24,164 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: Failed deleting my > ephemeral node* > *org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for > /hbase/rs/NODE11.ysance.local,60020,1366110719610* > * at > org.apache.zookeeper.KeeperException.create(KeeperException.java:127)* > * at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)* > * at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)* > * at > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:137) > * > * at > org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1215)* > * at > org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1204)* > * at > > org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1068) > * > * at > > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:846) > * > * at java.lang.Thread.run(Thread.java:662)* > *2013-04-16 14:31:24,165 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server > NODE11.ysance.local,60020,1366110719610; zookeeper connection closed.* > *2013-04-16 14:31:24,165 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020 > exiting* > *2013-04-16 14:31:24,165 INFO > org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown > hook thread.* > *2013-04-16 14:31:24,166 INFO > org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished.* > > In my mind, the issue comes from zookeeper/ regionserver but I can't > really identify where exactly the problem is. > > Do you have any idea ? > > Regards > > -- > Chung Fabien >
