Hi all, i have a cluster with 8 machines (CDH4). I use an ETL (Talend) to insert data into hbase. Mostof time that works perfectly, but sometimes rows are not inserted, and i don't have any clue about the reason of the failure. I have 0 errors on Talend. That usually happens when i delete the table in hbase and i recreate a new one from Talend.
I think these logs are revelant : * * *2013-04-16 14:31:09,610 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 5 on 60020: exiting* *2013-04-16 14:31:09,610 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder* *2013-04-16 14:31:09,610 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 4 on 60020: exiting* *2013-04-16 14:31:09,610 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 0 on 60020: exiting* *2013-04-16 14:31:09,610 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 6 on 60020: exiting* *2013-04-16 14:31:09,609 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 7 on 60020: exiting* *2013-04-16 14:31:09,609 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 3 on 60020: exiting* *2013-04-16 14:31:09,609 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 8 on 60020: exiting* *2013-04-16 14:31:09,609 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop the worker thread* *2013-04-16 14:31:09,610 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer* *2013-04-16 14:31:09,610 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker interrupted while waiting for task, exiting: java.lang.InterruptedException* *2013-04-16 14:31:09,610 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker NODE11.ysance.local,60020,1366110719610 exiting* *2013-04-16 14:31:09,611 INFO org.mortbay.log: Stopped [email protected]:60030* *2013-04-16 14:31:09,712 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: regionserver60020.cacheFlusher exiting* *2013-04-16 14:31:09,712 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting.* *2013-04-16 14:31:09,712 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver60020.compactionChecker exiting* *2013-04-16 14:31:09,712 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully.* *2013-04-16 14:31:09,727 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down* *2013-04-16 14:31:09,727 INFO org.apache.zookeeper.ZooKeeper: Session: 0x13e128af3010001 closed* *2013-04-16 14:31:09,727 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server NODE11.ysance.local,60020,1366110719610* *2013-04-16 14:31:09,728 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully.* *2013-04-16 14:31:09,728 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server NODE11.ysance.local,60020,1366110719610; all regions closed.* *2013-04-16 14:31:09,728 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: regionserver60020.logSyncer exiting* *2013-04-16 14:31:10,161 INFO org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closing leases* *2013-04-16 14:31:10,161 INFO org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closed leases* *2013-04-16 14:31:10,163 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs/NODE11.ysance.local,60020,1366110719610* *2013-04-16 14:31:10,163 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 2000ms before retry #1...* *2013-04-16 14:31:12,163 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs/NODE11.ysance.local,60020,1366110719610* *2013-04-16 14:31:12,163 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 4000ms before retry #2...* *2013-04-16 14:31:16,163 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs/NODE11.ysance.local,60020,1366110719610* *2013-04-16 14:31:16,163 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 8000ms before retry #3...* *2013-04-16 14:31:19,389 INFO org.apache.hadoop.hbase.regionserver.Leases: regionserver60020.leaseChecker closing leases* *2013-04-16 14:31:19,390 INFO org.apache.hadoop.hbase.regionserver.Leases: regionserver60020.leaseChecker closed leases* *2013-04-16 14:31:24,163 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs/NODE11.ysance.local,60020,1366110719610* *2013-04-16 14:31:24,163 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper delete failed after 3 retries* *2013-04-16 14:31:24,164 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Failed deleting my ephemeral node* *org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs/NODE11.ysance.local,60020,1366110719610* * at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)* * at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)* * at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)* * at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:137) * * at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1215)* * at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1204)* * at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1068) * * at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:846) * * at java.lang.Thread.run(Thread.java:662)* *2013-04-16 14:31:24,165 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server NODE11.ysance.local,60020,1366110719610; zookeeper connection closed.* *2013-04-16 14:31:24,165 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020 exiting* *2013-04-16 14:31:24,165 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown hook thread.* *2013-04-16 14:31:24,166 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished.* In my mind, the issue comes from zookeeper/ regionserver but I can't really identify where exactly the problem is. Do you have any idea ? Regards -- Chung Fabien
