Check your GC configurations.  Seems to that a Full GC has happened and the
Zookeeper thought that to be session expiry.

Regards
Ram

> -----Original Message-----
> From: Xiang Hua [mailto:[email protected]]
> Sent: Saturday, October 13, 2012 6:20 PM
> To: [email protected]
> Subject: hmaster and regionserver died
> 
> Hi,
>    the HMaster died as well as regionservers, below is hmaster's log.
> could
> you please find what's problem?
> 
> 
> 2012-10-12 00:14:19,444 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3/
> 10.20.16.34:2181, initiating session
> 2012-10-12 00:14:19,520 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-
> 3/
> 10.20.16.34:2181, sessionid = 0x139c539bc090002, negotiated timeout =
> 40000
> 2012-10-12 00:14:23,738 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 15046ms for sessionid
> 0x239c539ba630001, closing socket connection and attempting reconnect
> 2012-10-12 00:14:24,246 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/
> 10.20.16.33:2181
> 2012-10-12 00:14:25,173 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 15245ms for sessionid
> 0x139c539bc090003, closing socket connection and attempting reconnect
> 2012-10-12 00:14:25,328 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/
> 10.20.16.33:2181
> 2012-10-12 00:14:25,328 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/
> 10.20.16.33:2181, initiating session
> 2012-10-12 00:14:25,507 INFO org.apache.zookeeper.ClientCnxn:
> EventThread
> shut down
> 2012-10-12 00:14:25,507 INFO org.apache.zookeeper.ClientCnxn: Unable to
> reconnect to ZooKeeper service, session 0x139c539bc090003 has expired,
> closing socket connection
> 2012-10-12 00:14:27,247 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/
> 10.20.16.33:2181, initiating session
> 2012-10-12 00:14:27,248 WARN org.apache.zookeeper.ClientCnxn: Session
> 0x239c539ba630001 for server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/
> 10.20.16.33:2181, unexpected error, closing socket connection and
> attempting reconnect
> java.io.IOException: Connection reset by peer
>     at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>     at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>     at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218)
>     at sun.nio.ch.IOUtil.read(IOUtil.java:186)
>     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359)
>     at
> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:859)
>     at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1157)
> 2012-10-12 00:14:28,026 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3/
> 10.20.16.34:2181
> 2012-10-12 00:14:41,359 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 14007ms for sessionid
> 0x239c539ba630001, closing socket connection and attempting reconnect
> 2012-10-12 00:14:41,592 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server bj-ecsxhm4f3I-r3-5-r810-4-hbase-stor-1/
> 10.20.16.32:2181
> 2012-10-12 00:14:46,186 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 26666ms for sessionid
> 0x139c539bc090002, closing socket connection and attempting reconnect
> 2012-10-12 00:14:46,572 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/
> 10.20.16.33:2181
> 2012-10-12 00:14:46,572 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/
> 10.20.16.33:2181, initiating session
> 2012-10-12 00:14:46,726 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-
> 2/
> 10.20.16.33:2181, sessionid = 0x139c539bc090002, negotiated timeout =
> 40000
> 2012-10-12 00:14:54,925 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 13464ms for sessionid
> 0x239c539ba630001, closing socket connection and attempting reconnect
> 2012-10-12 00:14:56,524 ERROR org.apache.hadoop.hbase.master.HMaster:
> Region server
> serverName=bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2,60020,1347901025673,
> load=(requests=75, regions=1, usedHeap=162, maxHeap=9725) reported a
> fatal
> error:
> ABORTING region server
> serverName=bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2,60020,1347901025673,
> load=(requests=75, regions=1, usedHeap=162, maxHeap=9725):
> regionserver:60020-0x339c539ba640003 regionserver:60020-
> 0x339c539ba640003
> received expired from ZooKeeper, aborting
> Cause:
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooK
> eeperWatcher.java:353)
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWat
> cher.java:271)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.jav
> a:531)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)
> 
> 2012-10-12 00:14:56,813 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/
> 10.20.16.33:2181
> 2012-10-12 00:15:10,147 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 15119ms for sessionid
> 0x239c539ba630001, closing socket connection and attempting reconnect
> 2012-10-12 00:15:10,625 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3/
> 10.20.16.34:2181
> 2012-10-12 00:15:10,625 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3/
> 10.20.16.34:2181, initiating session
> 2012-10-12 00:15:10,750 INFO org.apache.zookeeper.ClientCnxn: Unable to
> reconnect to ZooKeeper service, session 0x239c539ba630001 has expired,
> closing socket connection
> 2012-10-12 00:15:10,750 FATAL org.apache.hadoop.hbase.master.HMaster:
> master:60000-0x239c539ba630001 master:60000-0x239c539ba630001 received
> expired from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooK
> eeperWatcher.java:353)
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWat
> cher.java:271)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.jav
> a:531)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)
> 2012-10-12 00:15:10,751 INFO org.apache.hadoop.hbase.master.HMaster:
> Aborting
> 2012-10-12 00:15:10,751 INFO org.apache.zookeeper.ClientCnxn:
> EventThread
> shut down
> 2012-10-12 00:15:11,392 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Stopping service threads
> 2012-10-12 00:15:11,392 INFO org.apache.hadoop.ipc.HBaseServer:
> Stopping
> server on 60000
> 2012-10-12 00:15:11,392 INFO
> org.apache.hadoop.hbase.master.CatalogJanitor:
> bj-ecsxhm4f3I-r3-7-r810-3-hbase-stor-6:60000-CatalogJanitor exiting
> 2012-10-12 00:15:11,392 INFO org.apache.hadoop.hbase.master.HMaster$2:
> bj-ecsxhm4f3I-r3-7-r810-3-hbase-stor-6:60000-BalancerChore exiting
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 0 on 60000: exiting
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 11 on 60000: exiting
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 6 on 60000: exiting
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 9 on 60000: exiting
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer:
> Stopping
> IPC Server listener on 60000
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 3 on 60000: exiting
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 7 on 60000: exiting
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 5 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.hbase.master.HMaster:
> Stopping infoServer
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 20 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 23 on 60000: exiting
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 19 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 25 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 29 on 60000: exiting
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 1 on 60000: exiting
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 18 on 60000: exiting
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 15 on 60000: exiting
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer:
> Stopping
> IPC Server Responder
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 16 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 37 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 40 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 41 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 46 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 47 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 50 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 51 on 60000: exiting
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 12 on 60000: exiting
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 14 on 60000: exiting
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 13 on 60000: exiting
> 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 10 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 59 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO
> org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor:
> bj-ecsxhm4f3I-r3-7-r810-3-hbase-stor-6:60000.timeoutMonitor exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 53 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 54 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 58 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 57 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 56 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 55 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 52 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 49 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 48 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 44 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 43 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 45 on 60000: exiting
> 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 42 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 39 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 38 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 35 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 36 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 34 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 2 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 17 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 4 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 32 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 8 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 33 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.mortbay.log: Stopped
> [email protected]:60010
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 30 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 31 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 28 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 27 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 26 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 24 on 60000: exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.hbase.master.LogCleaner:
> master-bj-ecsxhm4f3I-r3-7-r810-3-hbase-stor-6:60000.oldLogCleaner
> exiting
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 22 on 60000: exiting
> 2012-10-12 00:15:11,398 INFO
> org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner:
> Stopping
> replicationLogCleaner-0x139c539bc090003
> 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> handler 21 on 60000: exiting
> 2012-10-12 00:15:11,502 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil:
> master:60000-0x239c539ba630001 Unable to get data of znode
> /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /hbase/master
>     at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
>     at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
>     at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:54
> 9)
>     at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:6
> 20)
>     at
> org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterMan
> ager.java:197)
>     at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:310)
> 2012-10-12 00:15:11,502 ERROR
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:
> master:60000-0x239c539ba630001 Received unexpected KeeperException,
> re-throwing exception
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /hbase/master
>     at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
>     at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
>     at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:54
> 9)
>     at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:6
> 20)
>     at
> org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterMan
> ager.java:197)
>     at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:310)
> 2012-10-12 00:15:11,503 ERROR
> org.apache.hadoop.hbase.master.ActiveMasterManager:
> master:60000-0x239c539ba630001 Error deleting our own master address
> node
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /hbase/master
>     at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
>     at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
>     at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:54
> 9)
>     at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:6
> 20)
>     at
> org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterMan
> ager.java:197)
>     at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:310)
> 2012-10-12 00:15:11,503 DEBUG
> org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog
> tracker
> org.apache.hadoop.hbase.catalog.CatalogTracker@36664140
> 2012-10-12 00:15:11,503 DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementa
> tion:
> The connection to
> hconnection-0x139c539bc090002-0x139c539bc090002-0x139c539bc090002 has
> been
> closed.
> 2012-10-12 00:15:11,503 DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementa
> tion:
> The connection to
> hconnection-0x139c539bc090002-0x139c539bc090002-0x139c539bc090002 has
> been
> closed.
> 2012-10-12 00:15:11,503 INFO org.apache.hadoop.hbase.master.HMaster:
> HMaster main thread exiting
> 
> 
> Best R.
> 
> beatls

Reply via email to