I can see that the client is disconnecting from the server, and there is also a 
new round of leader election for the zookeeper servers. If this is happening, 
then yeah, your ensemble is unstable. If the ensemble leader election is being 
triggered frequently, then I'd start by looking there. Try to determine why the 
ensemble is failing to continue with the same leader. If ensemble elections 
aren't happening frequently, then another possibility is that GC pauses are 
causing the session to expire.

-Flavio

> On 24 Sep 2015, at 05:24, Akmal Abbasov <[email protected]> wrote:
> 
> Hi,
> I am using zookeeper 3.4.6
> I have a spark cluster configured with HA. Once per 1-2 days, the active 
> spark master is shutting down with a message
> 15/09/23 18:58:18 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x34ffa68dbd10021, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 15/09/23 18:58:18 INFO state.ConnectionStateManager: State change: SUSPENDED
> 15/09/23 18:58:18 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 15/09/23 18:58:18 ERROR master.Master: Leadership has been revoked -- master 
> shutting down.
> 15/09/23 18:58:18 INFO util.Utils: Shutdown hook called
> 
> I don’t have the zookeeper logs from the same period, but the logs are full 
> of the these messages 
> 2015-09-24 05:07:42,228 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /10.0.8.4:34705
> 2015-09-24 05:07:42,229 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection 
> request from old client /10.0.8.4:34705; will be dropped if server is in r-o 
> mode
> 2015-09-24 05:07:42,229 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client 
> attempting to establish new session at /10.0.8.4:34705
> 2015-09-24 05:07:42,292 [myid:1] - INFO  
> [CommitProcessor:1:ZooKeeperServer@617] - Established session 
> 0x14ffd3670130030 with negotiated timeout 20001 for client /10.0.8.4:34705
> 2015-09-24 05:07:42,302 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of 
> stream exception
> EndOfStreamException: Unable to read additional data from client sessionid 
> 0x14ffd3670130030, likely client has closed socket
>       at 
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>       at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>       at java.lang.Thread.run(Thread.java:745)
> 2015-09-24 05:07:42,303 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed 
> socket connection for client /10.0.8.4:34705 which had sessionid 
> 0x14ffd3670130030
> 2015-09-24 05:07:42,314 [myid:1] - ERROR 
> [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
> java.nio.channels.CancelledKeyException
>       at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
>       at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
>       at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
>       at 
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
> 2015-09-24 05:07:42,334 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /10.0.8.4:34707
> 2015-09-24 05:07:42,334 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection 
> request from old client /10.0.8.4:34707; will be dropped if server is in r-o 
> mode
> 2015-09-24 05:07:42,335 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client 
> attempting to establish new session at /10.0.8.4:34707
> 2015-09-24 05:07:42,357 [myid:1] - INFO  
> [CommitProcessor:1:ZooKeeperServer@617] - Established session 
> 0x14ffd3670130031 with negotiated timeout 20001 for client /10.0.8.4:34707
> 2015-09-24 05:07:42,364 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of 
> stream exception
> EndOfStreamException: Unable to read additional data from client sessionid 
> 0x14ffd3670130031, likely client has closed socket
>       at 
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>       at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>       at java.lang.Thread.run(Thread.java:745)
> 2015-09-24 05:07:42,365 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed 
> socket connection for client /10.0.8.4:34707 which had sessionid 
> 0x14ffd3670130031
> 2015-09-24 05:07:42,376 [myid:1] - ERROR 
> [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
> java.nio.channels.CancelledKeyException
>       at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
>       at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
>       at 
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
>       at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
>       at 
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
> 
> Also there are 
> 2015-09-24 06:29:54,459 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@139] - 
> Shutting down
> 2015-09-24 06:29:54,459 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@441] - shutting down
> 2015-09-24 06:29:54,459 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@105] - 
> Shutting down
> 2015-09-24 06:29:54,459 [myid:1] - INFO  
> [FollowerRequestProcessor:1:FollowerRequestProcessor@95] - 
> FollowerRequestProcessor exited loop!
> 2015-09-24 06:29:54,460 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:CommitProcessor@181] - Shutting down
> 2015-09-24 06:29:54,464 [myid:1] - INFO  
> [CommitProcessor:1:CommitProcessor@150] - CommitProcessor exited loop!
> 2015-09-24 06:29:54,465 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@415] - 
> shutdown of request processor complete
> 2015-09-24 06:29:54,466 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@209] - Shutting 
> down
> 2015-09-24 06:29:54,466 [myid:1] - INFO  
> [SyncThread:1:SyncRequestProcessor@187] - SyncRequestProcessor exited!
> 2015-09-24 06:29:54,466 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@714] - LOOKING
> 2015-09-24 06:29:54,584 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /10.0.8.58:36137
> 2015-09-24 06:29:54,584 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-09-24 06:29:54,584 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed 
> socket connection for client /10.0.8.58:36137 (no session established for 
> client)
> 2015-09-24 06:29:54,679 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /10.0.8.57:57410
> 2015-09-24 06:29:54,680 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-09-24 06:29:54,680 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed 
> socket connection for client /10.0.8.57:57410 (no session established for 
> client)
> 
> I also observed that hadoop-zkfc restarts very frequently.
> Any ideas what could be wrong?
> 
> Thanks.

Reply via email to