+users On Thu, Dec 6, 2018 at 9:01 PM Suman B N <sumannew...@gmail.com> wrote:
> Team, > > We are observing ISR shrink and expand very frequently. In the logs of the > follower, below errors are observed: > > [2018-12-06 20:00:42,709] WARN [ReplicaFetcherThread-2-15], Error in fetch > kafka.server.ReplicaFetcherThread$FetchRequest@a0f9ba9 > (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 15 was disconnected before the response > was read > at > kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$3(NetworkClientBlockingOps.scala:114) > at > kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$3$adapted(NetworkClientBlockingOps.scala:112) > at scala.Option.foreach(Option.scala:257) > at > kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$1(NetworkClientBlockingOps.scala:112) > at > kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:136) > at > kafka.utils.NetworkClientBlockingOps$.pollContinuously$extension(NetworkClientBlockingOps.scala:142) > at > kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108) > at > kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:249) > at > kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:234) > at > kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42) > at > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118) > at > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > > Can someone explain this? And help us understand how we can resolve these > under-replicated partitions. > > server.properties file: > broker.id=15 > port=9092 > zookeeper.connect=zk1,zk2,zk3,zk4,zk5,zk6 > > default.replication.factor=2 > log.dirs=/data/kafka > delete.topic.enable=true > zookeeper.session.timeout.ms=10000 > inter.broker.protocol.version=0.10.2 > num.partitions=3 > min.insync.replicas=1 > log.retention.ms=259200000 > message.max.bytes=20971520 > replica.fetch.max.bytes=20971520 > replica.fetch.response.max.bytes=20971520 > max.partition.fetch.bytes=20971520 > fetch.max.bytes=20971520 > log.flush.interval.ms=5000 > log.roll.hours=24 > num.replica.fetchers=3 > num.io.threads=8 > num.network.threads=6 > log.message.format.version=0.9.0.1 > > Also In what cases we lead to this state? We have 1200-1400 topics and > 5000-6000 partitions spread across 20 node cluster. But only 30-40 > partitions are under-replicated while rest are in-sync. 95% of these > partitions are having 2 replication factor. > > -- > *Suman* > -- *Suman* *OlaCabs*