After each zookeeper reconnect, I saw same session Id (0x703e...) What version of zookeeper are you using ? Can you search zookeeper log for this session Id to see what happened ?
Thanks On Apr 29, 2013, at 2:36 AM, Viral Bajaria <[email protected]> wrote: > On Mon, Apr 29, 2013 at 2:25 AM, Ted Yu <[email protected]> wrote: > >> I noticed the 8 occurrences of 0x703e... following region server name in >> the abort message. >> I wonder why the repetition ? >> >> Cheers > > Oh good observation. I just stepped through the logs again and saw that the > client timeout occurred 8 times before that exception. Maybe that explains > the repetitive occurrence, but that is definitely weird ? Logs pasted below. > > -Viral > > ===logs=== > 2013-04-29 07:41:03,227 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Serving as > regionserver1,60020,1367221255417, RPC listening on regionserver1/ > 10.155.234.3:60020, sessionid=0x703e48a8cfd81be6 > 2013-04-29 07:41:03,228 INFO > org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker > regionserver1,60020,1367221255417 starting > 2013-04-29 07:41:03,230 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Registered RegionServer > MXBean > 2013-04-29 07:41:05,967 INFO org.apache.hadoop.hbase.util.ChecksumType: > Checksum can use java.util.zip.CRC32 > 2013-04-29 07:43:47,739 INFO org.apache.zookeeper.ClientCnxn: Client > session timed out, have not heard from server in 27868ms for sessionid > 0x703e48a8cfd81be6, closing socket connection and attempting reconnect > 2013-04-29 07:43:48,776 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server 10.165.33.180:2181. Will not attempt to > authenticate using SASL (Unable to locate a login configuration) > 2013-04-29 07:43:48,777 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to 10.165.33.180:2181, initiating session > 2013-04-29 07:43:48,782 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server 10.165.33.180:2181, sessionid = > 0x703e48a8cfd81be6, negotiated timeout = 40000 > 2013-04-29 07:44:28,337 INFO org.apache.zookeeper.ClientCnxn: Client > session timed out, have not heard from server in 39427ms for sessionid > 0x703e48a8cfd81be6, closing socket connection and attempting reconnect > 2013-04-29 07:44:28,735 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server 10.152.152.84:2181. Will not attempt to > authenticate using SASL (Unable to locate a login configuration) > 2013-04-29 07:44:28,736 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to 10.152.152.84:2181, initiating session > 2013-04-29 07:44:28,738 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server 10.152.152.84:2181, sessionid = > 0x703e48a8cfd81be6, negotiated timeout = 40000 > 2013-04-29 07:46:29,080 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=98.34 MB, > free=11.61 GB, max=11.71 GB, blocks=0, accesses=0, hits=0, hitRatio=0, > cachingAccesses=0, cachingHits=0, cachingHitsRatio=0, evictions=0, > evicted=0, evictedPerRun=NaN > 2013-04-29 07:46:29,174 INFO org.apache.zookeeper.ClientCnxn: Client > session timed out, have not heard from server in 40354ms for sessionid > 0x703e48a8cfd81be6, closing socket connection and attempting reconnect > 2013-04-29 07:46:29,682 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server 10.147.128.158:2181. Will not attempt to > authenticate using SASL (Unable to locate a login configuration) > 2013-04-29 07:46:29,682 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to 10.147.128.158:2181, initiating session > 2013-04-29 07:46:29,684 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server 10.147.128.158:2181, sessionid = > 0x703e48a8cfd81be6, negotiated timeout = 40000 > 2013-04-29 07:49:10,404 INFO org.apache.zookeeper.ClientCnxn: Client > session timed out, have not heard from server in 39876ms for sessionid > 0x703e48a8cfd81be6, closing socket connection and attempting reconnect > 2013-04-29 07:49:11,503 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server 10.165.33.180:2181. Will not attempt to > authenticate using SASL (Unable to locate a login configuration) > 2013-04-29 07:49:11,504 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to 10.165.33.180:2181, initiating session > 2013-04-29 07:49:11,506 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server 10.165.33.180:2181, sessionid = > 0x703e48a8cfd81be6, negotiated timeout = 40000 > 2013-04-29 07:51:11,462 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=98.34 MB, > free=11.61 GB, max=11.71 GB, blocks=0, accesses=0, hits=0, hitRatio=0, > cachingAccesses=0, cachingHits=0, cachingHitsRatio=0, evictions=0, > evicted=0, evictedPerRun=NaN > 2013-04-29 07:53:52,158 INFO org.apache.zookeeper.ClientCnxn: Client > session timed out, have not heard from server in 40195ms for sessionid > 0x703e48a8cfd81be6, closing socket connection and attempting reconnect > 2013-04-29 07:53:52,555 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server 10.152.152.84:2181. Will not attempt to > authenticate using SASL (Unable to locate a login configuration) > 2013-04-29 07:53:52,556 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to 10.152.152.84:2181, initiating session > 2013-04-29 07:53:52,558 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server 10.152.152.84:2181, sessionid = > 0x703e48a8cfd81be6, negotiated timeout = 40000 > 2013-04-29 07:55:53,052 INFO org.apache.zookeeper.ClientCnxn: Client > session timed out, have not heard from server in 26955ms for sessionid > 0x703e48a8cfd81be6, closing socket connection and attempting reconnect > 2013-04-29 07:55:53,444 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server 10.147.128.158:2181. Will not attempt to > authenticate using SASL (Unable to locate a login configuration) > 2013-04-29 07:55:53,444 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to 10.147.128.158:2181, initiating session > 2013-04-29 07:55:53,446 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server 10.147.128.158:2181, sessionid = > 0x703e48a8cfd81be6, negotiated timeout = 40000 > 2013-04-29 07:55:55,512 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=98.34 MB, > free=11.61 GB, max=11.71 GB, blocks=0, accesses=0, hits=0, hitRatio=0, > cachingAccesses=0, cachingHits=0, cachingHitsRatio=0, evictions=0, > evicted=0, evictedPerRun=NaN > 2013-04-29 07:58:33,950 INFO org.apache.zookeeper.ClientCnxn: Client > session timed out, have not heard from server in 39972ms for sessionid > 0x703e48a8cfd81be6, closing socket connection and attempting reconnect > 2013-04-29 07:58:35,096 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server 10.165.33.180:2181. Will not attempt to > authenticate using SASL (Unable to locate a login configuration) > 2013-04-29 07:58:35,097 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to 10.165.33.180:2181, initiating session > 2013-04-29 07:58:35,098 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server 10.165.33.180:2181, sessionid = > 0x703e48a8cfd81be6, negotiated timeout = 40000 > 2013-04-29 08:00:55,512 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=98.34 MB, > free=11.61 GB, max=11.71 GB, blocks=0, accesses=0, hits=0, hitRatio=0, > cachingAccesses=0, cachingHits=0, cachingHitsRatio=0, evictions=0, > evicted=0, evictedPerRun=NaN > 2013-04-29 08:02:35,674 INFO org.apache.zookeeper.ClientCnxn: Client > session timed out, have not heard from server in 40592ms for sessionid > 0x703e48a8cfd81be6, closing socket connection and attempting reconnect > 2013-04-29 08:02:36,286 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server 10.152.152.84:2181. Will not attempt to > authenticate using SASL (Unable to locate a login configuration) > 2013-04-29 08:02:36,287 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to 10.152.152.84:2181, initiating session > 2013-04-29 08:02:36,288 INFO org.apache.zookeeper.ClientCnxn: Unable to > reconnect to ZooKeeper service, session 0x703e48a8cfd81be6 has expired, > closing socket connection > 2013-04-29 08:03:16,287 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > regionserver1,60020,1367221255417: > regionserver:60020-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6 > regionserver:60020-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6 > received expired from ZooKeeper, aborting > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:389) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > 2013-04-29 08:03:16,288 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > regionserver1,60020,1367221255417: Unhandled exception: > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > currently processing regionserver1,60020,1367221255417 as dead server > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > currently processing regionserver1,60020,1367221255417 as dead server > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:748) > at java.lang.Thread.run(Thread.java:662) > Caused by: org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > currently processing regionserver1,60020,1367221255417 as dead server > at > org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:254) > at > org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:172) > at > org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:984) > at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) > > at > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1021) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) > at $Proxy8.regionServerReport(Unknown Source)
