Hi,
We have a setup of 10 node Kafka cluster + 5 node ZK cluster . Kafka version is 0.10.2.1 . We had an issue where a ZK follower lost connection from the ZK leader which triggered a series of ISR Shrink and ISR Expands . This caused some of the partitions to have less number of replicas than expected which in turn threw exception in the publisher saying " org.apache.kafka.common.errors.NotEnoughReplicasException: Messages are rejected since there are fewer in-sync replicas than required" Is there any known issue with this version ? How do I check what triggered this ? Logs from ZK : [2018-03-07 21:48:26,063] WARN caught end of stream exception (org.apache.zookeeper.server.NIOServerCnxn) EndOfStreamException: Unable to read additional data from client sessionid 0x161e43d1e0f0001, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) at java.lang.Thread.run(Thread.java:745) [2018-03-07 21:48:27,967] WARN Exception when following the leader (org.apache.zookeeper.server.quorum.Learner) java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:846) [2018-03-07 21:48:33,998] WARN Got zxid 0x500016abb expected 0x1 (org.apache.zookeeper.server.quorum.Learner) [2018-03-07 21:49:28,970] WARN Exception when following the leader (org.apache.zookeeper.server.quorum.Learner) java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:846) [2018-03-07 21:49:29,311] WARN Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running (org.apache.zookeeper.server.NIOServerCnxn) [2018-03-07 21:49:29,326] WARN Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running (org.apache.zookeeper.server.NIOServerCnxn) [2018-03-07 21:49:29,635] WARN Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running (org.apache.zookeeper.server.NIOServerCnxn) [2018-03-07 21:49:30,510] WARN Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running (org.apache.zookeeper.server.NIOServerCnxn) [2018-03-07 21:49:33,001] WARN Unexpected exception, tries=0, connecting to host-2/host-2-IP:<port> (org.apache.zookeeper.server.quorum.Learner) java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:228) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:69) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:846) [2018-03-07 21:49:35,330] WARN Got zxid 0x500016ac9 expected 0x1 (org.apache.zookeeper.server.quorum.Learner) [2018-03-07 21:50:03,968] WARN Exception when following the leader (org.apache.zookeeper.server.quorum.Learner) java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:846) [2018-03-07 21:50:10,240] WARN Got zxid 0x500016caf expected 0x1 (org.apache.zookeeper.server.quorum.Learner) [2018-03-07 21:50:10,265] WARN Got zxid 0x500016e79 expected 0x1 (org.apache.zookeeper.server.quorum.Learner) [2018-03-07 21:50:31,974] WARN Exception when following the leader (org.apache.zookeeper.server.quorum.Learner) java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:846) [2018-03-07 21:50:36,004] WARN Unexpected exception, tries=0, connecting to to host-2/host-2-IP:<port> (org.apache.zookeeper.server.quorum.Learner) java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:228) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:69) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:846) [2018-03-07 21:50:40,009] WARN Got zxid 0x500017025 expected 0x1 (org.apache.zookeeper.server.quorum.Learner) [2018-03-07 21:50:40,297] WARN Got zxid 0x50001715a expected 0x1 (org.apache.zookeeper.server.quorum.Learner) [2018-03-07 21:51:32,981] WARN Exception when following the leader (org.apache.zookeeper.server.quorum.Learner) ISR Shrink and ISR expands 2018-03-07 21:49:37,657 INFO [cluster.Partition:kafka-scheduler-7] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 10,1,6,5 to 1,6,5 2018-03-07 21:49:38,342 INFO [cluster.Partition:kafka-request-handler-2] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 21:50:07,656 INFO [cluster.Partition:kafka-scheduler-3] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 21:50:10,623 INFO [cluster.Partition:kafka-request-handler-1] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 21:52:07,654 INFO [cluster.Partition:kafka-scheduler-3] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 21:52:07,682 INFO [cluster.Partition:kafka-request-handler-3] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 21:52:42,656 INFO [cluster.Partition:kafka-scheduler-9] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 21:52:44,536 INFO [cluster.Partition:kafka-request-handler-0] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 21:54:47,656 INFO [cluster.Partition:kafka-scheduler-5] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 21:54:49,594 INFO [cluster.Partition:kafka-request-handler-0] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 21:57:22,656 INFO [cluster.Partition:kafka-scheduler-6] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 21:57:23,730 INFO [cluster.Partition:kafka-request-handler-7] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:00:27,656 INFO [cluster.Partition:kafka-scheduler-5] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:00:29,562 INFO [cluster.Partition:kafka-request-handler-6] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:02:32,656 INFO [cluster.Partition:kafka-scheduler-3] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:02:33,563 INFO [cluster.Partition:kafka-request-handler-2] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:03:02,657 INFO [cluster.Partition:kafka-scheduler-2] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:03:05,350 INFO [cluster.Partition:kafka-request-handler-5] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:03:32,657 INFO [cluster.Partition:kafka-scheduler-9] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:03:35,633 INFO [cluster.Partition:kafka-request-handler-7] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:05:02,654 INFO [cluster.Partition:kafka-scheduler-1] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:05:02,814 INFO [cluster.Partition:kafka-request-handler-6] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:08:42,656 INFO [cluster.Partition:kafka-scheduler-3] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:08:45,440 INFO [cluster.Partition:kafka-request-handler-2] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:10:17,656 INFO [cluster.Partition:kafka-scheduler-8] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:10:18,670 INFO [cluster.Partition:kafka-request-handler-4] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:12:52,656 INFO [cluster.Partition:kafka-scheduler-4] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:12:53,783 INFO [cluster.Partition:kafka-request-handler-6] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:15:27,656 INFO [cluster.Partition:kafka-scheduler-6] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:15:28,640 INFO [cluster.Partition:kafka-request-handler-2] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:15:57,657 INFO [cluster.Partition:kafka-scheduler-0] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:15:59,415 INFO [cluster.Partition:kafka-request-handler-5] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:16:27,654 INFO [cluster.Partition:kafka-scheduler-9] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:16:30,702 INFO [cluster.Partition:kafka-request-handler-0] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:19:02,656 INFO [cluster.Partition:kafka-scheduler-5] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:19:05,707 INFO [cluster.Partition:kafka-request-handler-7] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:20:07,656 INFO [cluster.Partition:kafka-scheduler-8] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:20:07,740 INFO [cluster.Partition:kafka-request-handler-5] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:20:37,657 INFO [cluster.Partition:kafka-scheduler-4] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:20:40,028 INFO [cluster.Partition:kafka-request-handler-7] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:22:42,656 INFO [cluster.Partition:kafka-scheduler-8] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:22:43,121 INFO [cluster.Partition:kafka-request-handler-6] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:23:37,650 INFO [cluster.Partition:kafka-scheduler-6] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:23:40,046 INFO [cluster.Partition:kafka-request-handler-3] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:25:47,656 INFO [cluster.Partition:kafka-scheduler-4] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:25:48,890 INFO [cluster.Partition:kafka-request-handler-3] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:27:52,656 INFO [cluster.Partition:kafka-scheduler-2] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:27:52,761 INFO [cluster.Partition:kafka-request-handler-3] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:28:52,657 INFO [cluster.Partition:kafka-scheduler-4] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:28:55,632 INFO [cluster.Partition:kafka-request-handler-2] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:30:57,656 INFO [cluster.Partition:kafka-scheduler-3] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:30:58,803 INFO [cluster.Partition:kafka-request-handler-2] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:33:32,657 INFO [cluster.Partition:kafka-scheduler-0] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:33:33,709 INFO [cluster.Partition:kafka-request-handler-5] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:35:37,653 INFO [cluster.Partition:kafka-scheduler-4] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:35:37,832 INFO [cluster.Partition:kafka-request-handler-4] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:36:07,656 INFO [cluster.Partition:kafka-scheduler-5] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:36:08,688 INFO [cluster.Partition:kafka-request-handler-5] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10 2018-03-07 22:37:02,656 INFO [cluster.Partition:kafka-scheduler-7] Partition [MY_TOPIC_36,0] on broker 5: Shrinking ISR for partition [MY_TOPIC_36,0] from 1,6,5,10 to 1,6,5 2018-03-07 22:37:05,593 INFO [cluster.Partition:kafka-request-handler-7] Partition [MY_TOPIC_36,0] on broker 5: Expanding ISR for partition MY_TOPIC_36-0 from 1,6,5 to 1,6,5,10