Check your GC configurations. Seems to that a Full GC has happened and the Zookeeper thought that to be session expiry.
Regards Ram > -----Original Message----- > From: Xiang Hua [mailto:[email protected]] > Sent: Saturday, October 13, 2012 6:20 PM > To: [email protected] > Subject: hmaster and regionserver died > > Hi, > the HMaster died as well as regionservers, below is hmaster's log. > could > you please find what's problem? > > > 2012-10-12 00:14:19,444 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3/ > 10.20.16.34:2181, initiating session > 2012-10-12 00:14:19,520 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor- > 3/ > 10.20.16.34:2181, sessionid = 0x139c539bc090002, negotiated timeout = > 40000 > 2012-10-12 00:14:23,738 INFO org.apache.zookeeper.ClientCnxn: Client > session timed out, have not heard from server in 15046ms for sessionid > 0x239c539ba630001, closing socket connection and attempting reconnect > 2012-10-12 00:14:24,246 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ > 10.20.16.33:2181 > 2012-10-12 00:14:25,173 INFO org.apache.zookeeper.ClientCnxn: Client > session timed out, have not heard from server in 15245ms for sessionid > 0x139c539bc090003, closing socket connection and attempting reconnect > 2012-10-12 00:14:25,328 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ > 10.20.16.33:2181 > 2012-10-12 00:14:25,328 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ > 10.20.16.33:2181, initiating session > 2012-10-12 00:14:25,507 INFO org.apache.zookeeper.ClientCnxn: > EventThread > shut down > 2012-10-12 00:14:25,507 INFO org.apache.zookeeper.ClientCnxn: Unable to > reconnect to ZooKeeper service, session 0x139c539bc090003 has expired, > closing socket connection > 2012-10-12 00:14:27,247 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ > 10.20.16.33:2181, initiating session > 2012-10-12 00:14:27,248 WARN org.apache.zookeeper.ClientCnxn: Session > 0x239c539ba630001 for server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ > 10.20.16.33:2181, unexpected error, closing socket connection and > attempting reconnect > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218) > at sun.nio.ch.IOUtil.read(IOUtil.java:186) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359) > at > org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:859) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1157) > 2012-10-12 00:14:28,026 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3/ > 10.20.16.34:2181 > 2012-10-12 00:14:41,359 INFO org.apache.zookeeper.ClientCnxn: Client > session timed out, have not heard from server in 14007ms for sessionid > 0x239c539ba630001, closing socket connection and attempting reconnect > 2012-10-12 00:14:41,592 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server bj-ecsxhm4f3I-r3-5-r810-4-hbase-stor-1/ > 10.20.16.32:2181 > 2012-10-12 00:14:46,186 INFO org.apache.zookeeper.ClientCnxn: Client > session timed out, have not heard from server in 26666ms for sessionid > 0x139c539bc090002, closing socket connection and attempting reconnect > 2012-10-12 00:14:46,572 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ > 10.20.16.33:2181 > 2012-10-12 00:14:46,572 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ > 10.20.16.33:2181, initiating session > 2012-10-12 00:14:46,726 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor- > 2/ > 10.20.16.33:2181, sessionid = 0x139c539bc090002, negotiated timeout = > 40000 > 2012-10-12 00:14:54,925 INFO org.apache.zookeeper.ClientCnxn: Client > session timed out, have not heard from server in 13464ms for sessionid > 0x239c539ba630001, closing socket connection and attempting reconnect > 2012-10-12 00:14:56,524 ERROR org.apache.hadoop.hbase.master.HMaster: > Region server > serverName=bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2,60020,1347901025673, > load=(requests=75, regions=1, usedHeap=162, maxHeap=9725) reported a > fatal > error: > ABORTING region server > serverName=bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2,60020,1347901025673, > load=(requests=75, regions=1, usedHeap=162, maxHeap=9725): > regionserver:60020-0x339c539ba640003 regionserver:60020- > 0x339c539ba640003 > received expired from ZooKeeper, aborting > Cause: > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooK > eeperWatcher.java:353) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWat > cher.java:271) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.jav > a:531) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507) > > 2012-10-12 00:14:56,813 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ > 10.20.16.33:2181 > 2012-10-12 00:15:10,147 INFO org.apache.zookeeper.ClientCnxn: Client > session timed out, have not heard from server in 15119ms for sessionid > 0x239c539ba630001, closing socket connection and attempting reconnect > 2012-10-12 00:15:10,625 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3/ > 10.20.16.34:2181 > 2012-10-12 00:15:10,625 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3/ > 10.20.16.34:2181, initiating session > 2012-10-12 00:15:10,750 INFO org.apache.zookeeper.ClientCnxn: Unable to > reconnect to ZooKeeper service, session 0x239c539ba630001 has expired, > closing socket connection > 2012-10-12 00:15:10,750 FATAL org.apache.hadoop.hbase.master.HMaster: > master:60000-0x239c539ba630001 master:60000-0x239c539ba630001 received > expired from ZooKeeper, aborting > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooK > eeperWatcher.java:353) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWat > cher.java:271) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.jav > a:531) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507) > 2012-10-12 00:15:10,751 INFO org.apache.hadoop.hbase.master.HMaster: > Aborting > 2012-10-12 00:15:10,751 INFO org.apache.zookeeper.ClientCnxn: > EventThread > shut down > 2012-10-12 00:15:11,392 DEBUG org.apache.hadoop.hbase.master.HMaster: > Stopping service threads > 2012-10-12 00:15:11,392 INFO org.apache.hadoop.ipc.HBaseServer: > Stopping > server on 60000 > 2012-10-12 00:15:11,392 INFO > org.apache.hadoop.hbase.master.CatalogJanitor: > bj-ecsxhm4f3I-r3-7-r810-3-hbase-stor-6:60000-CatalogJanitor exiting > 2012-10-12 00:15:11,392 INFO org.apache.hadoop.hbase.master.HMaster$2: > bj-ecsxhm4f3I-r3-7-r810-3-hbase-stor-6:60000-BalancerChore exiting > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 0 on 60000: exiting > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 11 on 60000: exiting > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 6 on 60000: exiting > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 9 on 60000: exiting > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: > Stopping > IPC Server listener on 60000 > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 3 on 60000: exiting > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 7 on 60000: exiting > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 5 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.hbase.master.HMaster: > Stopping infoServer > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 20 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 23 on 60000: exiting > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 19 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 25 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 29 on 60000: exiting > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 1 on 60000: exiting > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 18 on 60000: exiting > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 15 on 60000: exiting > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: > Stopping > IPC Server Responder > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 16 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 37 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 40 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 41 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 46 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 47 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 50 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 51 on 60000: exiting > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 12 on 60000: exiting > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 14 on 60000: exiting > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 13 on 60000: exiting > 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 10 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 59 on 60000: exiting > 2012-10-12 00:15:11,395 INFO > org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: > bj-ecsxhm4f3I-r3-7-r810-3-hbase-stor-6:60000.timeoutMonitor exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 53 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 54 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 58 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 57 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 56 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 55 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 52 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 49 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 48 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 44 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 43 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 45 on 60000: exiting > 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 42 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 39 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 38 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 35 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 36 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 34 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 2 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 17 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 4 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 32 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 8 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 33 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.mortbay.log: Stopped > [email protected]:60010 > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 30 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 31 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 28 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 27 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 26 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 24 on 60000: exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.hbase.master.LogCleaner: > master-bj-ecsxhm4f3I-r3-7-r810-3-hbase-stor-6:60000.oldLogCleaner > exiting > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 22 on 60000: exiting > 2012-10-12 00:15:11,398 INFO > org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner: > Stopping > replicationLogCleaner-0x139c539bc090003 > 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > handler 21 on 60000: exiting > 2012-10-12 00:15:11,502 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: > master:60000-0x239c539ba630001 Unable to get data of znode > /hbase/master > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for /hbase/master > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:118) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:54 > 9) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:6 > 20) > at > org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterMan > ager.java:197) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:310) > 2012-10-12 00:15:11,502 ERROR > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: > master:60000-0x239c539ba630001 Received unexpected KeeperException, > re-throwing exception > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for /hbase/master > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:118) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:54 > 9) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:6 > 20) > at > org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterMan > ager.java:197) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:310) > 2012-10-12 00:15:11,503 ERROR > org.apache.hadoop.hbase.master.ActiveMasterManager: > master:60000-0x239c539ba630001 Error deleting our own master address > node > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for /hbase/master > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:118) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:54 > 9) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:6 > 20) > at > org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterMan > ager.java:197) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:310) > 2012-10-12 00:15:11,503 DEBUG > org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog > tracker > org.apache.hadoop.hbase.catalog.CatalogTracker@36664140 > 2012-10-12 00:15:11,503 DEBUG > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementa > tion: > The connection to > hconnection-0x139c539bc090002-0x139c539bc090002-0x139c539bc090002 has > been > closed. > 2012-10-12 00:15:11,503 DEBUG > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementa > tion: > The connection to > hconnection-0x139c539bc090002-0x139c539bc090002-0x139c539bc090002 has > been > closed. > 2012-10-12 00:15:11,503 INFO org.apache.hadoop.hbase.master.HMaster: > HMaster main thread exiting > > > Best R. > > beatls
