Hi, I have PoC HBase cluster running on 3 VM deployment schema is: NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server (ZK), DN NODE02 RegionServer, DN NODE03 RegionServer, DN
Suddenly ONLY HBase went offline, all services: HM RS HDFS was working, no alerts were there ZK server was working, no alerts there. VMWare didn't publish any alerts. Only restart of HBase service helped. We are using this: http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html hbase-0.94.15+113 I made a deep dive into logs and found this stuff: 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer regionserver60020.periodicFlusher requesting flush for region epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. after a delay of 3026 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile Bloom filter type for hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb: ROW, CompoundBloomFilterWriter 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile Delete Family Bloom filter type for hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb: CompoundBloomFilterWriter 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile General Bloom and NO DeleteFamily was added to HFile (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb) 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store Flushed , sequenceid=229362, memsize=7.7 K, into tmp file hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader Loaded ROW (CompoundBloomFilter) metadata for 8e68424066dc4c02a60ca57ec98128fb 08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader Loaded ROW (CompoundBloomFilter) metadata for 8e68424066dc4c02a60ca57ec98128fb 08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store Added hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8e68424066dc4c02a60ca57ec98128fb, entries=8, sequenceid=229362, filesize=2.7 K 08:15:55.076INFOorg.apache.hadoop.hbase.regionserver.HRegion Finished memstore flush of ~7.7 K/7840, currentsize=0/0 for region epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. in 80ms, sequenceid=229362, compaction requested=true 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.HRegion Starting compaction on CF in region epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.Store Starting compaction of 4 file(s) in CF of epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. into tmpdir=hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp, seqid=229362, totalSize=76.6 M 08:15:55.096INFOorg.apache.hadoop.hbase.regionserver.StoreFile Bloom filter type for hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2: ROW, CompoundBloomFilterWriter 08:15:55.097INFOorg.apache.hadoop.hbase.regionserver.StoreFile Delete Family Bloom filter type for hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2: CompoundBloomFilterWriter 08:15:59.245INFOorg.apache.hadoop.hbase.regionserver.StoreFile General Bloom and NO DeleteFamily was added to HFile (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2) 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader Loaded ROW (CompoundBloomFilter) metadata for 8bf8e92031834676b5d40b352120c5f2 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.Store Renaming compacted file at hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2 to hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8bf8e92031834676b5d40b352120c5f2 08:15:59.266INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader Loaded ROW (CompoundBloomFilter) metadata for 8bf8e92031834676b5d40b352120c5f2 08:15:59.282INFOorg.apache.hadoop.hbase.regionserver.Store Completed major compaction of 4 file(s) in CF of epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for store is 76.6 M 08:15:59.283INFO org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest completed compaction: regionName=epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457., storeName=CF, fileCount=4, fileSize=76.6 M, priority=3, time=8079772165508216; duration=4sec 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog Roll /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420860306123, entries=3, filesize=7313. for /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420863906224 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog moving old hlog file /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020 whose highest sequenceid is 229359 to /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer regionserver60020.periodicFlusher requesting flush for region epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after a delay of 14844 09:16:02.021INFOorg.apache.hadoop.hbase.regionserver.HRegionServer regionserver60020.periodicFlusher requesting flush for region epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. after a delay of 21848 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile Bloom filter type for hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d: ROW, CompoundBloomFilterWriter 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile Delete Family Bloom filter type for hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d: CompoundBloomFilterWriter 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.StoreFile General Bloom and NO DeleteFamily was added to HFile (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d) 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.Store Flushed , sequenceid=229369, memsize=16.3 K, into tmp file hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d 09:16:06.932INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader Loaded ROW (CompoundBloomFilter) metadata for a148cdce49964965b7b6f4728881427d 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader Loaded ROW (CompoundBloomFilter) metadata for a148cdce49964965b7b6f4728881427d 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.Store Added hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/CF/a148cdce49964965b7b6f4728881427d, entries=16, sequenceid=229369, filesize=4.5 K 09:16:06.955INFOorg.apache.hadoop.hbase.regionserver.HRegion Finished memstore flush of ~16.3 K/16648, currentsize=0/0 for region epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. in 90ms, sequenceid=229369, compaction requested=false 09:16:55.986INFOorg.apache.zookeeper.ClientCnxn Client session timed out, have not heard from server in 40003ms for sessionid 0x1493f19431b0005, closing socket connection and attempting reconnect 09:16:56.552INFOorg.apache.zookeeper.ClientCnxn Client session timed out, have not heard from server in 40020ms for sessionid 0x1493f19431b0000, closing socket connection and attempting reconnect 09:16:57.535INFOorg.apache.zookeeper.ClientCnxn Opening socket connection to server etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration) 09:16:57.550INFOorg.apache.zookeeper.ClientCnxn Socket connection established to etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session 09:16:58.448INFOorg.apache.zookeeper.ClientCnxn Opening socket connection to server etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration) 09:16:58.449INFOorg.apache.zookeeper.ClientCnxn Socket connection established to etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating session 09:17:20.581INFOorg.apache.zookeeper.ClientCnxn Unable to reconnect to ZooKeeper service, session 0x1493f19431b0005 has expired, closing socket connection 09:17:20.582INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation This client just lost it's session with ZooKeeper, will automatically reconnect when needed. 09:17:20.582INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation ZK session expired. This disconnect could have been caused by a network partition or a long-running GC pause, either way it's recommended that you verify your environment. 09:17:20.582INFOorg.apache.zookeeper.ClientCnxn EventThread shut down 09:17:20.584INFOorg.apache.zookeeper.ClientCnxn Unable to reconnect to ZooKeeper service, session 0x1493f19431b0000 has expired, closing socket connection 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer ABORTING region server etp-hdfs-n1-sg.passport.local,60020,1414102905372: regionserver:60020-0x1493f19431b0000 regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper, aborting org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 09:17:20.733FATALorg.apache.hadoop.hbase.regionserver.HRegionServer RegionServer abort: loaded coprocessors are: [] 09:17:20.736INFOorg.apache.hadoop.hbase.regionserver.HRegionServer Dump of metrics: requestsPerSecond=0, numberOfOnlineRegions=3, numberOfStores=3, hlogFileCount=2, hlogFileSizeMB=365, numberOfStorefiles=4, storefileIndexSizeMB=0, rootIndexSizeKB=2, totalStaticIndexSizeKB=650, totalStaticBloomSizeKB=256, memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0, readRequestsCount=279141, writeRequestsCount=91793, compactionQueueSize=0, flushQueueSize=0, usedHeapMB=455, maxHeapMB=1015, blockCacheSizeMB=2.66, blockCacheFreeMB=251.26, blockCacheCount=7, blockCacheHitCount=596347, blockCacheMissCount=3360967, blockCacheEvictedCount=36743, blockCacheHitRatio=15%, blockCacheHitCachingRatio=93%, hdfsBlocksLocalityIndex=100, slowHLogAppendCount=0, fsReadLatencyHistogramMean=330098.85, fsReadLatencyHistogramCount=2970773.00, fsReadLatencyHistogramMedian=312261.50, fsReadLatencyHistogram75th=357708.50, fsReadLatencyHistogram95th=3147993.00, fsReadLatencyHistogram99th=4431904.00, fsReadLatencyHistogram999th=4431904.00, fsPreadLatencyHistogramMean=3245622.44, fsPreadLatencyHistogramCount=36742.00, fsPreadLatencyHistogramMedian=3180486.50, fsPreadLatencyHistogram75th=3733068.50, fsPreadLatencyHistogram95th=6424537.75, fsPreadLatencyHistogram99th=24206578.00, fsPreadLatencyHistogram999th=24206578.00, fsWriteLatencyHistogramMean=356278.05, fsWriteLatencyHistogramCount=2983888.00, fsWriteLatencyHistogramMedian=538147.00, fsWriteLatencyHistogram75th=570668.00, fsWriteLatencyHistogram95th=570668.00, fsWriteLatencyHistogram99th=570668.00, fsWriteLatencyHistogram999th=570668.00 09:17:20.749INFOorg.apache.hadoop.hbase.regionserver.HRegionServer STOPPED: regionserver:60020-0x1493f19431b0000 regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper, aborting 09:17:20.750INFOorg.apache.zookeeper.ClientCnxn EventThread shut down 09:17:20.755INFOorg.apache.hadoop.ipc.HBaseServer Stopping server on 60020 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer IPC Server handler 0 on 60020: exiting 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer PRI IPC Server handler 1 on 60020: exiting 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer PRI IPC Server handler 0 on 60020: exiting 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer Stopping IPC Server listener on 60020 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer PRI IPC Server handler 3 on 60020: exiting 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer PRI IPC Server handler 8 on 60020: exiting 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer PRI IPC Server handler 4 on 60020: exiting 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer IPC Server handler 3 on 60020: exiting 09:17:20.757INFOorg.apache.hadoop.ipc. HBaseServer PRI IPC Server handler 6 on 60020: exiting 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer IPC Server handler 2 on 60020: exiting 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer IPC Server handler 6 on 60020: exiting 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer PRI IPC Server handler 5 on 60020: exiting 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer PRI IPC Server handler 7 on 60020: exiting 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer PRI IPC Server handler 2 on 60020: exiting 09:17:20.757INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker Sending interrupt to stop the worker thread 09:17:20.758INFOorg.apache.hadoop.hbase.regionserver.HRegionServer Stopping infoServer 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer IPC Server handler 4 on 60020: exiting 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer Stopping IPC Server Responder 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer IPC Server handler 7 on 60020: exiting 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer IPC Server handler 8 on 60020: exiting 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer IPC Server handler 1 on 60020: exiting 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer PRI IPC Server handler 9 on 60020: exiting 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer IPC Server handler 5 on 60020: exiting 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer Stopping IPC Server Responder 09:17:20.761INFOorg.apache.hadoop.ipc.HBaseServer IPC Server handler 9 on 60020: exiting 09:17:20.763INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker SplitLogWorker interrupted while waiting for task, exiting: java.lang.InterruptedException 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer REPL IPC Server handler 0 on 60020: exiting 09:17:20.767INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker SplitLogWorker etp-hdfs-n1-sg.passport.local,60020,1414102905372 exiting 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer REPL IPC Server handler 2 on 60020: exiting 09:17:20.768INFOorg.mortbay.log Stopped SelectChannelConnector@0.0.0.0:60030 09:17:20.768INFOorg.apache.hadoop.ipc.HBaseServer REPL IPC Server handler 1 on 60020: exiting 09:17:20.908INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager Stopping RegionServerSnapshotManager abruptly. 09:17:20.908INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker regionserver60020.compactionChecker exiting 09:17:20.908INFOorg.apache.hadoop.hbase.regionserver.LogRoller LogRoller exiting. 09:17:20.909INFOorg.apache.hadoop.hbase.regionserver.MemStoreFlusher regionserver60020.cacheFlusher exiting 09:17:20.920INFOorg.apache.hadoop.hbase.regionserver.HRegionServer aborting server etp-hdfs-n1-sg.passport.local,60020,1414102905372 09:17:20.924INFOorg.apache.hadoop.hbase.regionserver.HRegionServer Waiting on 3 regions to close 09:17:20.943INFOorg.apache.hadoop.hbase.regionserver.Store Closed CF 09:17:20.949INFOorg.apache.hadoop.hbase.regionserver.HRegion Closed epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. 09:17:20.951INFOorg.apache.hadoop.hbase.regionserver.Store Closed info 09:17:20.953INFOorg.apache.hadoop.hbase.regionserver.HRegion Closed .META.,,1.1028785192 09:17:20.956INFOorg.apache.hadoop.hbase.regionserver.Store Closed CF 09:17:20.957ERRORorg.apache.hadoop.hbase.regionserver.HRegion Memstore size is 13376 09:17:20.957INFOorg.apache.hadoop.hbase.regionserver.HRegion Closed epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. 09:17:21.125INFOorg.apache.hadoop.hbase.regionserver.HRegionServer stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372; all regions closed. 09:17:21.126INFOorg.apache.hadoop.hbase.regionserver.wal.HLog regionserver60020.logSyncer exiting 09:17:21.240INFOorg.apache.hadoop.hbase.regionserver.Leases regionserver60020 closing leases 09:17:21.241INFOorg.apache.hadoop.hbase.regionserver.Leases regionserver60020 closed leases 09:17:22.021INFO org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher regionserver60020.periodicFlusher exiting 09:17:22.031WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372 09:17:22.031INFOorg.apache.hadoop.hbase.util.RetryCounter Sleeping 2000ms before retry #1... 09:17:23.958INFOorg.apache.hadoop.hbase.regionserver.Leases regionserver60020.leaseChecker closing leases 09:17:23.959INFOorg.apache.hadoop.hbase.regionserver.Leases regionserver60020.leaseChecker closed leases 09:17:24.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372 09:17:24.032INFOorg.apache.hadoop.hbase.util.RetryCounter Sleeping 4000ms before retry #2... 09:17:28.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372 09:17:28.033INFOorg.apache.hadoop.hbase.util.RetryCounter Sleeping 8000ms before retry #3... 09:17:36.033WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372 09:17:36.033ERRORorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper ZooKeeper delete failed after 3 retries 09:17:36.033WARNorg.apache.hadoop.hbase.regionserver.HRegionServer Failed deleting my ephemeral node org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:133) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1196) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1185) at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1132) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897) at java.lang.Thread.run(Thread.java:662) 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372; zookeeper connection closed. 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer regionserver60020 exiting 09:17:36.051ERROR org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine Region server exiting java.lang.RuntimeException: HRegionServer Aborted at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:65) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:84) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:4106) 09:17:36.139INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@727f3b8a 09:17:36.140INFOorg.apache.hadoop.hbase.regionserver.HRegionServer STOPPED: Shutdown hook 09:17:36.141INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook Starting fs shutdown hook thread. 09:17:36.142INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook Shutdown hook finished. 10:16:40.584INFOorg.apache.hadoop.hbase.util.VersionInfo HBase 0.94.15-cdh4.7.0 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo Subversion file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.94.15-cdh4.7.0 -r Unknown 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo Compiled by jenkins on Wed May 28 10:48:53 PDT 2014 10:16:40.912INFOorg.apache.hadoop.hbase.util.ServerCommandLine vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Sun Microsystems Inc., vmVersion=20.6-b01 10:16:40.913INFOorg.apache.hadoop.hbase.util.ServerCommandLine vmInputArguments=[-XX:OnOutOfMemoryError=kill, -9, %p, -Xmx1000m, -Djava.net.preferIPv4Stack=true, -Xms1073741824, -Xmx1073741824, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC, -XX:-CMSConcurrentMTEnabled, -XX:CMSInitiatingOccupancyFraction=70, -XX:+CMSParallelRemarkEnabled, -Dhbase.log.dir=/var/log/hbase, -Dhbase.log.file=hbase-cmf-hbase1-REGIONSERVER-etp-hdfs-n1-sg.passport.local.log.out, -Dhbase.home.dir=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase, -Dhbase.id.str=, -Dhbase.root.logger=INFO,RFA, -Djava.library.path=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase/lib/native/Linux-amd64-64, -Dhbase.security.logger=INFO,DRFAS] 10:16:41.392INFOorg.apache.hadoop.ipc.HBaseServer Starting Thread-2 10:16:41.393INFOorg.apache.hadoop.ipc.sdfHBaseServer Starting Thread-2 this FATAL 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer ABORTING region server etp-hdfs-n1-sg.passport.local,60020,1414102905372: regionserver:60020-0x1493f19431b0000 regionserver:60020-0x1493f19431b0000 received expired from ZooKeeper, aborting org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) found on each RS I didn' t see any problems with ZK or memory usage. The load is extremely low, just few put/get in a second, you can see file sizes in logs. What it could be how can i catch the root cause?