This could be a reason as well:
2013-04-22 16:47:21,900 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Too 
many consecutive RollWriter requests, it's a sign of the total number of live 
datanodes is lower than the tolerable replicas.
Make sure your cluster is in good health conditions...


Thank you!

Sincerely,
Leonid Fedotov
On Apr 22, 2013, at 6:25 PM, kaveh minooie wrote:

> 
> Hi
> 
> after a few mapreduce jobs my regionservers shut themselves down. this is the 
> latest time that this has happened:
> 
> 2013-04-22 16:47:21,843 INFO 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
> This client just lost it's session with ZooKeeper, trying to reconnect.
> 2013-04-22 16:47:21,843 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> serverName=d1r1n17.prod.plutoz.com,60020,1366657358443, load=(requests=5
> 392, regions=196, usedHeap=1063, maxHeap=3966): 
> regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 
> regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 received expired fr
> om ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired
>        at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:352)
>        at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:270)
>        at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:523)
>        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:499)
> 2013-04-22 16:47:21,843 INFO 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
> Trying to reconnect to zookeeper.
> 2013-04-22 16:47:21,844 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: 
> requests=1794, regions=196, stores=1561, storefiles=1585, 
> storefileIndexSize=104, memstoreSize=306, compactionQueueSize=10, 
> flushQueueSize=0, usedHeap=1073, maxHeap=3966, blockCacheSize=661986032, 
> blockCacheFree=169901776, blockCacheCount=7242, blockCacheHitCount=910925, 
> blockCacheMissCount=1558134, blockCacheEvictedCount=1344753, 
> blockCacheHitRatio=36, blockCacheHitCachingRatio=40
> 2013-04-22 16:47:21,844 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: 
> regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 
> regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 received expired from 
> ZooKeeper, aborting
> 2013-04-22 16:47:21,844 INFO org.apache.zookeeper.ClientCnxn: EventThread 
> shut down
> 2013-04-22 16:47:21,900 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Too many consecutive RollWriter requests, it's a sign of the total number of 
> live datanodes is lower than the tolerable replicas.
> 2013-04-22 16:47:22,341 INFO org.apache.zookeeper.ZooKeeper: Initiating 
> client connection, connectString=zk1:2181 sessionTimeout=180000 
> watcher=hconnection
> 2013-04-22 16:47:22,357 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 1 regions to 
> close
> 2013-04-22 16:47:22,394 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server d1r2n2.prod.plutoz.com/10.0.0.66:2181. Will not attempt 
> to authenticate using SASL (unknown error)
> 2013-04-22 16:47:22,395 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to d1r2n2.prod.plutoz.com/10.0.0.66:2181, initiating 
> session
> 2013-04-22 16:47:22,397 INFO org.apache.zookeeper.ClientCnxn: Session 
> establishment complete on server d1r2n2.prod.plutoz.com/10.0.0.66:2181, 
> sessionid = 0x13dd980d2abbf93, negotiated timeout = 40000
> 2013-04-22 16:47:22,400 INFO 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
> Reconnected successfully. This disconnect could have been caused by a network 
> partition or a long-running GC pause, either way it's recommended that you 
> verify your environment.
> 2013-04-22 16:47:22,400 INFO org.apache.zookeeper.ClientCnxn: EventThread 
> shut down
> 2013-04-22 16:47:56,830 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> compaction interrupted by user:
> java.io.InterruptedIOException: Aborting compaction of store f in region 
> t1_webpage,com.pandora.www:http/shaggy,1366670139658.9f565d5da3468c0725e590dc232abc23.
>  because user requested stop.
>        at org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:998)
>        at org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:779)
>        at 
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:776)
>        at 
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:721)
>        at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:81)
> 2013-04-22 16:47:56,830 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> aborted compaction on region 
> t1_webpage,com.pandora.www:http/shaggy,1366670139658.9f565d5da3468c0725e590dc232abc23.
>  after 5mins, 58sec
> 2013-04-22 16:47:56,830 INFO 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: 
> regionserver60020.compactor exiting
> 2013-04-22 16:47:56,832 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Closed 
> t1_webpage,com.pandora.www:http/shaggy,1366670139658.9f565d5da3468c0725e590dc232abc23.
> 2013-04-22 16:47:57,363 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
> regionserver60020.logSyncer exiting
> 2013-04-22 16:47:57,366 INFO org.apache.hadoop.hbase.regionserver.Leases: 
> regionserver60020 closing leases
> 2013-04-22 16:47:57,366 INFO org.apache.hadoop.hbase.regionserver.Leases: 
> regionserver60020 closed leases
> 2013-04-22 16:47:57,366 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020 exiting
> 2013-04-22 16:47:57,497 INFO 
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook starting; 
> hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread-15,5,main]
> 2013-04-22 16:47:57,497 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown hook
> 2013-04-22 16:47:57,497 INFO 
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown hook 
> thread.
> 2013-04-22 16:47:57,504 INFO org.apache.hadoop.hbase.regionserver.Leases: 
> regionserver60020.leaseChecker closing leases
> 2013-04-22 16:47:57,504 INFO org.apache.hadoop.hbase.regionserver.Leases: 
> regionserver60020.leaseChecker closed leases
> 2013-04-22 16:47:57,598 INFO 
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished.
> 
> I would appreciate it very much if someone could explain to me what just 
> happened here.
> 
> thanks,

Reply via email to