We are using latest docker release, so I'm not thinking on network issues

Cluster, however, handles a lot of traffic, so perhaps node is "too busy" to answer within time. Is this possible?

I'll have an eye on syncLimit.

Thanks.

PS: Any other replies are welcome too!


On 17/07/17 21:40, Abraham Fine wrote:
Hello-

It looks like there is a timeout during a ping. This is strange because
pings don't need to transfer much data.

Does the issue always occur with the same learner? Is it possible that
there is a network issue here?

You also may want to consider playing with the syncLimit.

Thanks,
Abe

On Mon, Jul 17, 2017, at 01:30, [email protected] wrote:
ping!


On 14/07/17 13:27, [email protected] wrote:
Hi

Using 3.5.3-beta, from time to time, our zookeeper ensemble leader
commits suicide:

     Jul 13 12:40:32 host zookeeper[28511]: [2017-07-13 12:40:32,797]
     ERROR Unexpected exception causing shutdown while sock still open
     (org.apache.zookeeper.server.quorum.LearnerHandler)
     Jul 13 12:40:32 host zookeeper[28511]:
     java.net.SocketTimeoutException: Read timed out
     Jul 13 12:40:32 host zookeeper[28511]: #011at
     java.net.SocketInputStream.socketRead0(Native Method)
     Jul 13 12:40:32 host zookeeper[28511]: #011at
     java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
     Jul 13 12:40:32 host zookeeper[28511]: #011at
     java.net.SocketInputStream.read(SocketInputStream.java:171)
     Jul 13 12:40:32 host zookeeper[28511]: #011at
     java.net.SocketInputStream.read(SocketInputStream.java:141)
     Jul 13 12:40:32 host zookeeper[28511]: #011at
     java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
     Jul 13 12:40:32 host zookeeper[28511]: #011at
     java.io.BufferedInputStream.read(BufferedInputStream.java:265)
     Jul 13 12:40:32 host zookeeper[28511]: #011at
     java.io.DataInputStream.readInt(DataInputStream.java:387)
     Jul 13 12:40:32 host zookeeper[28511]: #011at
     org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
     Jul 13 12:40:32 host zookeeper[28511]: #011at
     
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
     Jul 13 12:40:32 host zookeeper[28511]: #011at
     org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
     Jul 13 12:40:32 host zookeeper[28511]: #011at
     
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:515)
     Jul 13 12:40:32 host zookeeper[28511]: [2017-07-13 12:40:32,844]
     WARN ******* GOODBYE /10.0.0.11:42816 ********
     (org.apache.zookeeper.server.quorum.LearnerHandler)

Another ensemble node also complains (*cause or effect? notice
miliseconds!*):

     Jul 13 12:40:32 host12 zookeeper[26293]: [2017-07-13 12:40:32,500]
     WARN Exception when following the leader
     (org.apache.zookeeper.server.quorum.Learner)
     Jul 13 12:40:32 host12 zookeeper[26293]: java.net.SocketException:
     Broken pipe (Write failed)
     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
     java.net.SocketOutputStream.socketWrite0(Native Method)
     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
     java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
     java.net.SocketOutputStream.write(SocketOutputStream.java:155)
     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
     java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
     java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
     org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:141)
     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
     org.apache.zookeeper.server.quorum.Learner.ping(Learner.java:620)
     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
     
org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:118)
     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
     org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:92)
     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
     org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
     Jul 13 12:40:32 host12 zookeeper[26293]: [2017-07-13 12:40:32,567]
     WARN PeerState set to LOOKING
     (org.apache.zookeeper.server.quorum.QuorumPeer)

while the third node still continues living happily.

Ideas?




Reply via email to