Hello- It looks like there is a timeout during a ping. This is strange because pings don't need to transfer much data.
Does the issue always occur with the same learner? Is it possible that there is a network issue here? You also may want to consider playing with the syncLimit. Thanks, Abe On Mon, Jul 17, 2017, at 01:30, mosto...@gmail.com wrote: > ping! > > > On 14/07/17 13:27, mosto...@gmail.com wrote: > > > > Hi > > > > Using 3.5.3-beta, from time to time, our zookeeper ensemble leader > > commits suicide: > > > > Jul 13 12:40:32 host zookeeper[28511]: [2017-07-13 12:40:32,797] > > ERROR Unexpected exception causing shutdown while sock still open > > (org.apache.zookeeper.server.quorum.LearnerHandler) > > Jul 13 12:40:32 host zookeeper[28511]: > > java.net.SocketTimeoutException: Read timed out > > Jul 13 12:40:32 host zookeeper[28511]: #011at > > java.net.SocketInputStream.socketRead0(Native Method) > > Jul 13 12:40:32 host zookeeper[28511]: #011at > > java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > > Jul 13 12:40:32 host zookeeper[28511]: #011at > > java.net.SocketInputStream.read(SocketInputStream.java:171) > > Jul 13 12:40:32 host zookeeper[28511]: #011at > > java.net.SocketInputStream.read(SocketInputStream.java:141) > > Jul 13 12:40:32 host zookeeper[28511]: #011at > > java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > > Jul 13 12:40:32 host zookeeper[28511]: #011at > > java.io.BufferedInputStream.read(BufferedInputStream.java:265) > > Jul 13 12:40:32 host zookeeper[28511]: #011at > > java.io.DataInputStream.readInt(DataInputStream.java:387) > > Jul 13 12:40:32 host zookeeper[28511]: #011at > > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > > Jul 13 12:40:32 host zookeeper[28511]: #011at > > > > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) > > Jul 13 12:40:32 host zookeeper[28511]: #011at > > > > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) > > Jul 13 12:40:32 host zookeeper[28511]: #011at > > > > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:515) > > Jul 13 12:40:32 host zookeeper[28511]: [2017-07-13 12:40:32,844] > > WARN ******* GOODBYE /10.0.0.11:42816 ******** > > (org.apache.zookeeper.server.quorum.LearnerHandler) > > > > Another ensemble node also complains (*cause or effect? notice > > miliseconds!*): > > > > Jul 13 12:40:32 host12 zookeeper[26293]: [2017-07-13 12:40:32,500] > > WARN Exception when following the leader > > (org.apache.zookeeper.server.quorum.Learner) > > Jul 13 12:40:32 host12 zookeeper[26293]: java.net.SocketException: > > Broken pipe (Write failed) > > Jul 13 12:40:32 host12 zookeeper[26293]: #011at > > java.net.SocketOutputStream.socketWrite0(Native Method) > > Jul 13 12:40:32 host12 zookeeper[26293]: #011at > > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > > Jul 13 12:40:32 host12 zookeeper[26293]: #011at > > java.net.SocketOutputStream.write(SocketOutputStream.java:155) > > Jul 13 12:40:32 host12 zookeeper[26293]: #011at > > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > > Jul 13 12:40:32 host12 zookeeper[26293]: #011at > > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > > Jul 13 12:40:32 host12 zookeeper[26293]: #011at > > org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:141) > > Jul 13 12:40:32 host12 zookeeper[26293]: #011at > > org.apache.zookeeper.server.quorum.Learner.ping(Learner.java:620) > > Jul 13 12:40:32 host12 zookeeper[26293]: #011at > > > > org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:118) > > Jul 13 12:40:32 host12 zookeeper[26293]: #011at > > > > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:92) > > Jul 13 12:40:32 host12 zookeeper[26293]: #011at > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133) > > Jul 13 12:40:32 host12 zookeeper[26293]: [2017-07-13 12:40:32,567] > > WARN PeerState set to LOOKING > > (org.apache.zookeeper.server.quorum.QuorumPeer) > > > > while the third node still continues living happily. > > > > Ideas? > > > > > > >