Finally my HMaster has stabilized and been running for 7 hours.  I
believe my networking issues are behind me now.  Thank you everyone for
the help.

New issue is my RSes continue to die after about 20 minutes.  Again the
cluster is idle.  No jobs are running and I get this on all of my RSes
at almost the same time:

2012-07-05 19:34:05,283 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server devrackA-04/172.18.0.5:2181
2012-07-05 19:34:05,288 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to devrackA-04/172.18.0.5:2181, initiating session
2012-07-05 19:34:05,301 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server devrackA-04/172.18.0.5:2181, sessionid
= 0x13858fc240f0003, negotiated timeout = 180000
2012-07-05 19:34:05,399 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Installed shutdown
hook thread: Shutdownhook:regionserver60020
2012-07-05 20:06:40,279 INFO org.apache.zookeeper.ClientCnxn: Unable to
read additional data from server sessionid 0x13858fc240f0003, likely
server has closed socket, closing socket connection and attempting reconnect
2012-07-05 20:06:40,573 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server devrackA-03/172.18.0.4:2181
2012-07-05 20:06:40,574 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to devrackA-03/172.18.0.4:2181, initiating session
2012-07-05 20:06:40,578 INFO org.apache.zookeeper.ClientCnxn: Unable to
reconnect to ZooKeeper service, session 0x13858fc240f0003 has expired,
closing socket connection
2012-07-05 20:06:40,586 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
server serverName=devrackB-07,60020,1341542045088, load=(requests=0,
regions=0, usedHeap=0, maxHeap=0): regionserver:60020-0x13858fc240f0003
regionserver:60020-0x13858fc240f0003 received expired from ZooKeeper,
aborting
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired

Could the fact that the cluster is idle cause the sessions to expire?
It's almost like a timing trigger pops, the sessions expire, and then
can reconnect.  Is there a timer I need to adjust?

Could this be related to a TCP or IP timer that needs to be adjusted?
The session goes into a Fin/Wait state and then closes?

Thank you
---
Jay Wilson

Reply via email to