hi,all: i have a 5node hadoop cluster ,and i deploy HA autofailover . but i find the failover happened each 24 hours,i do not know why
here is my NN log , zkfc log ,zookeeper log content: *NN log:* 2013-11-03 23:28:36,740 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 147925 2013-11-03 23:30:10,103 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state 2013-11-03 23:30:10,103 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 147925 2013-11-03 23:30:10,104 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1Number of transactions batched in Sync s: 0 Number of syncs: 1 SyncTimes(ms): 11 52 55 2013-11-03 23:30:10,170 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1Number of transactions batched in Sync s: 0 Number of syncs: 2 SyncTimes(ms): 33 74 77 2013-11-03 23:30:10,222 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /data/namespace/1/current/edits_inprogress_0000000000000147925 -> / data/namespace/1/current/edits_0000000000000147925-0000000000000147926 2013-11-03 23:30:10,222 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /data/namespace/2/current/edits_inprogress_0000000000000147925 -> / data/namespace/2/current/edits_0000000000000147925-0000000000000147926 2013-11-03 23:30:10,222 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for standby state 2013-11-03 23:30:10,224 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Will roll logs on active node at ch12/192.168.11.12:8020 every 120 seconds. 2013-11-03 23:30:10,224 INFO org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Starting standby checkpoint thread... Checkpointing active NN at ch12:50070 Serving checkpoints at ch11/192.168.11.11:50070 2013-11-03 23:30:10,225 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException: Op eration category WRITE is not supported in state standby 2013-11-03 23:30:10,225 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from 192.168.11.13:54864: error: org.apache.hadoop.ipc.StandbyException: Operation category WRITE is not supported in state standby 2013-11-03 23:30:11,524 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException: Op eration category WRITE is not supported in state standby 2013-11-03 23:30:11,524 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from 192.168.11.13:54864: error: org.apache.hadoop.ipc.StandbyException: Operation category WRITE is not supported in state standby 2013-11-03 23:30:15,956 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException: Op eration category WRITE is not supported in state standby 2013-11-03 23:30:15,956 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from 192.168.11.15:56050: error: org.apache.hadoop.ipc.StandbyException: Operation category WRITE is not supported in state standby 2013-11-03 23:30:19,902 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException: Operation category WRITE is not supported in state standby 2013-11-03 23:30:19,902 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from 192.168.11.12:45839: error: org.apache.hadoop.ipc.StandbyException: Operation category WRITE is not supported in state standby 2013-11-03 23:30:21,386 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby 2013-11-03 23:30:21,387 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getListing from 192.168.11.11:49854: error: org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby 2013-11-03 23:30:25,215 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException: Operation category WRITE is not supported in state standby 2013-11-03 23:30:25,215 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from 192.168.11.14:40328: error: org.apache.hadoop.ipc.StandbyException: Operation category WRITE is not supported in state standby 2013-11-03 23:30:49,513 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby 2013-11-03 23:30:49,513 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getListing from 192.168.11.18:55750: error: org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby 2013-11-03 23:32:10,326 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log roll on remote NameNode ch12/192.168.11.12:8020 2013-11-03 23:32:10,741 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@172efb72expecting start txid #147927 2013-11-03 23:32:10,741 INFO org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding stream ' http://ch12:8480/getJournal?jid=product&segmentTxId=147927&storageInfo=-40%3A2086828354%3A0%3ACID-0e777b8c-19f3-44a1-8af1-916877f2506c, http://ch11:8480/getJournal?jid=product&segmentTxId=147927&storageInfo=-40%3A2086828354%3A0%3ACID-0e777b8c-19f3-44a1-8af1-916877f2506c'to transaction ID 147927 *ZKFC log:* 2013-11-02 23:30:09,669 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x41e34ab04a0192, likely server has closed socket, closing socket connection and attempting reconnect 2013-11-02 23:30:09,771 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session disconnected. Entering neutral mode... 2013-11-02 23:30:09,982 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ch12/192.168.11.12:2181. Will not attempt to authenticate using SASL (unknown error) 2013-11-02 23:30:09,982 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ch12/192.168.11.12:2181, initiating session 2013-11-02 23:30:09,985 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x41e34ab04a0192 has expired, closing socket connection 2013-11-02 23:30:09,986 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session expired. Entering neutral mode and rejoining... 2013-11-02 23:30:09,986 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session 2013-11-02 23:30:09,987 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ch11:2181,ch12:2181,ch13:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@e6eb495 2013-11-02 23:30:09,988 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ch11/192.168.11.11:2181. Will not attempt to authenticate using SASL (unknown error) 2013-11-02 23:30:09,989 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ch11/192.168.11.11:2181, initiating session 2013-11-02 23:30:09,996 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ch11/192.168.11.11:2181, sessionid = 0x41e34ab04a023a, negotiated timeout = 5000 2013-11-02 23:30:09,999 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2013-11-02 23:30:10,000 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected. 2013-11-02 23:30:10,005 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced... 2013-11-02 23:30:10,007 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a0770726f6475637412036e6e321a046368313220d43e28d33e 2013-11-02 23:30:10,008 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at ch12/192.168.11.12:8020 2013-11-02 23:30:10,110 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at ch12/192.168.11.12:8020 to standby state without fencing 2013-11-02 23:30:10,110 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /hadoop-ha/product/ActiveBreadCrumb to indicate that the local node is the most recent active... 2013-11-02 23:30:10,117 INFO org.apache.hadoop.ha.ZKFailoverController: Trying to make NameNode at ch11/192.168.11.11:8020 active... 2013-11-02 23:30:11,736 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at ch11/192.168.11.11:8020 to active state 2013-11-03 23:30:09,687 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x41e34ab04a023a, likely server has closed socket, closing socket connection and attempting reconnect 2013-11-03 23:30:09,789 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session disconnected. Entering neutral mode... 2013-11-03 23:30:10,695 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ch12/192.168.11.12:2181. Will not attempt to authenticate using SASL (unknown error) 2013-11-03 23:30:10,696 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ch12/192.168.11.12:2181, initiating session 2013-11-03 23:30:10,698 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x41e34ab04a023a has expired, closing socket connection 2013-11-03 23:30:10,699 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session expired. Entering neutral mode and rejoining... 2013-11-03 23:30:10,699 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session 2013-11-03 23:30:10,701 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ch11:2181,ch12:2181,ch13:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@52192617 2013-11-03 23:30:10,703 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ch11/192.168.11.11:2181. Will not attempt to authenticate using SASL (unknown error) 2013-11-03 23:30:10,703 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ch11/192.168.11.11:2181, initiating session 2013-11-03 23:30:10,711 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ch11/192.168.11.11:2181, sessionid = 0x41e34ab04a02d3, negotiated timeout = 5000 2013-11-03 23:30:10,713 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2013-11-03 23:30:10,715 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected. 2013-11-03 23:30:10,721 INFO org.apache.hadoop.ha.ZKFailoverController: ZK Election indicated that NameNode at ch11/192.168.11.11:8020 should become standby 2013-11-03 23:30:10,725 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at ch11/192.168.11.11:8020 to standby state 2013-11-03 23:30:10,799 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x41e34ab04a023a *zookeeper log:* 2013-11-03 23:29:41,109 [myid:0] - INFO [Thread-33356:NIOServerCnxn@1001] - Closed socket connection for client /192.168.11.142:54260 (no session established for client) 2013-11-03 23:30:04,777 [myid:0] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.11.12:52797 2013-11-03 23:30:04,778 [myid:0] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x141e46bfac70233 at /192.168.11.12:52797 2013-11-03 23:30:04,778 [myid:0] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: 0x141e46bfac70233 2013-11-03 23:30:04,779 [myid:0] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@595] - Established session 0x141e46bfac70233 with negotiated timeout 5000 for client / 192.168.11.12:52797 2013-11-03 23:30:09,682 [myid:0] - INFO [CommitProcessor:0:NIOServerCnxn@1001] - Closed socket connection for client /192.168.11.12:52797 which had sessionid 0x141e46bfac70233 2013-11-03 23:30:09,687 [myid:0] - INFO [CommitProcessor:0:NIOServerCnxn@1001] - Closed socket connection for client /192.168.11.11:48112 which had sessionid 0x41e34ab04a023a 2013-11-03 23:30:10,703 [myid:0] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.11.11:57037 2013-11-03 23:30:10,703 [myid:0] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@839] - Client attempting to establish new session at /192.168.11.11:57037 2013-11-03 23:30:10,711 [myid:0] - INFO [CommitProcessor:0:ZooKeeperServer@595] - Established session 0x41e34ab04a02d3 with negotiated timeout 5000 for client / 192.168.11.11:57037 2013-11-03 23:30:18,008 [myid:0] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:60661 2013-11-03 23:30:18,008 [myid:0] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@821] - Processing mntr command from / 127.0.0.1:60661 2013-11-03 23:30:18,009 [myid:0] - INFO [Thread-33357:NIOServerCnxn@1001] - Closed socket connection for client /127.0.0.1:60661 (no session established for client) 2013-11-03 23:30:38,008 [myid:0] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:60664 2013-11-03 23:30:38,008 [myid:0] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@821] - Processing mntr command from / 127.0.0.1:60664 2013-11-03 23:30:38,009 [myid:0] - INFO [Thread-33358:NIOServerCnxn@1001] - Closed socket connection for client /127.0.0.1:60664 (no session established for client) 2013-11-03 23:31:18,008 [myid:0] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:60670