Thanks for your reply.
Replication factor is 2 and min.insync.replicas is default (1)
I will get back to you on the producer's ack settings.


Regards,
Dhiraj

On Sun, Jun 5, 2022 at 1:20 PM Liam Clarke-Hutchinson <lclar...@redhat.com>
wrote:

> Off the top of my head, it looks like it lost network connectivity to some
> extent.
>
> Question - what settings were used for topics like efGamePlay? What is min
> insync replicas, replication factor, and what acks settings is the producer
> using?
>
> Cheers,
>
> Liam
>
> On Fri, 3 Jun 2022 at 22:55, dhiraj prajapati <dhirajp...@gmail.com>
> wrote:
>
> > Hi all,
> > Recently we faced an issue with one of our production kafka clusters:
> >  - It is a 3 node cluster
> >  - kafka server version is 1.0
> >
> > *Issue*:
> > One of the brokers had some problem resulting in the following:
> > 1. The broker lost leadership of all of the topic-parttions
> > 2. However the kafka server process did *NOT* get stopped/killed.
> >
> > I have attached the exception traces from controller and server logs at
> > that time at the end of this email.
> >
> > *Questions:*
> > 1. What could be the reason behind this happening?
> > 2. How can I reproduce this exact scenario in our test environment?
> >
> > P.S. We did not see any GC logs, or general network blips at around that
> > time.
> >
> > *Exception trace in controller log:*
> >
> > [2022-05-21 09:07:45,914] INFO [RequestSendThread controllerId=2]
> Starting
> > (kafka.controller.RequestSendThread)
> > [2022-05-21 09:07:45,914] INFO [RequestSendThread controllerId=2]
> Starting
> > (kafka.controller.RequestSendThread)
> > [2022-05-21 09:07:48,354] ERROR [Controller id=2] Error while electing or
> > becoming controller on broker 2 (kafka.controller.KafkaController)
> > kafka.zookeeper.ZooKeeperClientExpiredException: Session expired either
> > before or while waiting for connection
> >         at
> >
> >
> kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply$mcV$sp(ZooKeeperClient.scala:233)
> >         at
> >
> >
> kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply(ZooKeeperClient.scala:221)
> >         at
> >
> >
> kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply(ZooKeeperClient.scala:221)
> >         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250)
> >         at
> >
> >
> kafka.zookeeper.ZooKeeperClient.kafka$zookeeper$ZooKeeperClient$$waitUntilConnected(ZooKeeperClient.scala:221)
> >         at
> >
> >
> kafka.zookeeper.ZooKeeperClient$$anonfun$waitUntilConnected$1.apply$mcV$sp(ZooKeeperClient.scala:215)
> >         at
> >
> >
> kafka.zookeeper.ZooKeeperClient$$anonfun$waitUntilConnected$1.apply(ZooKeeperClient.scala:215)
> >         at
> >
> >
> kafka.zookeeper.ZooKeeperClient$$anonfun$waitUntilConnected$1.apply(ZooKeeperClient.scala:215)
> >         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250)
> >         at
> >
> >
> kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:214)
> >         at
> >
> >
> kafka.zk.KafkaZkClient.retryRequestsUntilConnected(KafkaZkClient.scala:1461)
> >         at
> >
> >
> kafka.zk.KafkaZkClient.kafka$zk$KafkaZkClient$$retryRequestUntilConnected(KafkaZkClient.scala:1437)
> >         at
> > kafka.zk.KafkaZkClient.getPartitionReassignment(KafkaZkClient.scala:671)
> >         at
> >
> >
> kafka.controller.KafkaController.initializePartitionReassignment(KafkaController.scala:698)
> >         at
> >
> >
> kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:655)
> >         at
> >
> >
> kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:232)
> >         at
> >
> >
> kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1203)
> >         at
> >
> >
> kafka.controller.KafkaController$Reelect$.process(KafkaController.scala:1479)
> >         at
> >
> >
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:69)
> >         at
> >
> >
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:69)
> >         at
> >
> >
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:69)
> >         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
> >         at
> >
> >
> kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:68)
> >         at
> kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> > [2022-05-21 09:07:48,358] DEBUG [Controller id=2] Resigning
> > (kafka.controller.KafkaController)
> > [2022-05-21 09:07:48,358] DEBUG [Controller id=2] Unregister
> > BrokerModifications handler for Set(2, 1)
> > (kafka.controller.KafkaController)
> > [2022-05-21 09:07:48,360] INFO [PartitionStateMachine controllerId=2]
> > Stopped partition state machine (kafka.controller.PartitionStateMachine)
> > [2022-05-21 09:07:48,361] INFO [ReplicaStateMachine controllerId=2]
> Stopped
> > replica state machine (kafka.controller.ReplicaStateMachine)
> > [2022-05-21 09:07:48,361] INFO [RequestSendThread controllerId=2]
> Shutting
> > down (kafka.controller.RequestSendThread)
> > [2022-05-21 09:07:48,364] INFO [RequestSendThread controllerId=2] Stopped
> > (kafka.controller.RequestSendThread)
> > [2022-05-21 09:07:48,364] INFO [RequestSendThread controllerId=2]
> Shutdown
> > completed (kafka.controller.RequestSendThread)
> > [2022-05-21 09:07:48,364] INFO [RequestSendThread controllerId=2]
> Shutting
> > down (kafka.controller.RequestSendThread)
> > [2022-05-21 09:07:48,364] INFO [RequestSendThread controllerId=2] Stopped
> > (kafka.controller.RequestSendThread)
> > [2022-05-21 09:07:48,364] INFO [RequestSendThread controllerId=2]
> Shutdown
> > completed (kafka.controller.RequestSendThread)
> > [2022-05-21 09:07:48,365] INFO [Controller id=2] Resigned
> > (kafka.controller.KafkaController)
> >
> >
> > *Snippets from server log:*
> >
> > [2022-05-21 09:06:36,678] INFO [Log partition=efAcquisition-18,
> > dir=/home/tomcat/kafka_2.11-0.9.0.1/commitlogs] Rolled new log segment at
> > offset 252314842 in 2 ms. (kafka.log.Log)
> > [2022-05-21 09:06:53,106] INFO [GroupCoordinator 2]: Preparing to
> rebalance
> > group
> >
> spark-kafka-source-800e8ff9-02f4-4c34-bfc9-2d73d723b4e3-1912945116-driver-0
> > with old generation 1 (__consumer_offsets-43)
> > (kafka.coordinator.group.GroupCoordinator)
> > [2022-05-21 09:06:53,107] INFO [GroupCoordinator 2]: Group
> >
> spark-kafka-source-800e8ff9-02f4-4c34-bfc9-2d73d723b4e3-1912945116-driver-0
> > with generation 2 is now empty (__consumer_offsets-43)
> > (kafka.coordinator.group.GroupCoordinator)
> > [2022-05-21 09:06:59,793] INFO [GroupCoordinator 2]: Preparing to
> rebalance
> > group
> >
> spark-kafka-source-c2b4063b-baf3-40aa-af8b-1c0ea4246ca8-196930323-driver-0
> > with old generation 1 (__consumer_offsets-37)
> > (kafka.coordinator.group.GroupCoordinator)
> > [2022-05-21 09:06:59,793] INFO [GroupCoordinator 2]: Group
> >
> spark-kafka-source-c2b4063b-baf3-40aa-af8b-1c0ea4246ca8-196930323-driver-0
> > with generation 2 is now empty (__consumer_offsets-37)
> > (kafka.coordinator.group.GroupCoordinator)
> > [2022-05-21 09:07:21,154] INFO [ProducerStateManager
> partition=hh_topic-9]
> > Writing producer snapshot at offset 7837005056
> > (kafka.log.ProducerStateManager)
> > [2022-05-21 09:07:21,155] INFO [Log partition=hh_topic-9,
> > dir=/home/tomcat/kafka_2.11-0.9.0.1/commitlogs] Rolled new log segment at
> > offset 7837005056 in 2 ms. (kafka.log.Log)
> > [2022-05-21 09:07:26,088] INFO [ProducerStateManager
> > partition=efGamePlay-11] Writing producer snapshot at offset 2116578569
> > (kafka.log.ProducerStateManager)
> > [2022-05-21 09:07:26,089] INFO [Log partition=efGamePlay-11,
> > dir=/home/tomcat/kafka_2.11-0.9.0.1/commitlogs] Rolled new log segment at
> > offset 2116578569 in 2 ms. (kafka.log.Log)
> > [2022-05-21 09:07:27,170] INFO [ProducerStateManager
> > partition=efGamePlay-18] Writing producer snapshot at offset 2116607621
> > (kafka.log.ProducerStateManager)
> > [2022-05-21 09:07:27,171] INFO [Log partition=efGamePlay-18,
> > dir=/home/tomcat/kafka_2.11-0.9.0.1/commitlogs] Rolled new log segment at
> > offset 2116607621 in 2 ms. (kafka.log.Log)
> > [2022-05-21 09:07:30,444] INFO [ProducerStateManager
> > partition=auditTopic-36] Writing producer snapshot at offset 145646657
> > (kafka.log.ProducerStateManager)
> > [2022-05-21 09:07:30,445] INFO [Log partition=auditTopic-36,
> > dir=/home/tomcat/kafka_2.11-0.9.0.1/commitlogs] Rolled new log segment at
> > offset 145646657 in 2 ms. (kafka.log.Log)
> > [2022-05-21 09:07:33,189] INFO [ProducerStateManager
> > partition=heartbeats-0] Writing producer snapshot at offset 43292127
> > (kafka.log.ProducerStateManager)
> > [2022-05-21 09:07:33,190] INFO [Log partition=heartbeats-0,
> > dir=/home/tomcat/kafka_2.11-0.9.0.1/commitlogs] Rolled new log segment at
> > offset 43292127 in 3 ms. (kafka.log.Log)
> > [2022-05-21 09:07:35,760] INFO [GroupMetadataManager brokerId=2] Group
> >
> spark-kafka-source-c2b4063b-baf3-40aa-af8b-1c0ea4246ca8-196930323-driver-0
> > transitioned to Dead in generation 2
> > (kafka.coordinator.group.GroupMetadataManager)
> > [2022-05-21 09:07:35,760] INFO [GroupMetadataManager brokerId=2] Group
> >
> spark-kafka-source-800e8ff9-02f4-4c34-bfc9-2d73d723b4e3-1912945116-driver-0
> > transitioned to Dead in generation 2
> > (kafka.coordinator.group.GroupMetadataManager)
> > [2022-05-21 09:07:35,760] INFO [GroupMetadataManager brokerId=2] Removed
> 0
> > expired offsets in 0 milliseconds.
> > (kafka.coordinator.group.GroupMetadataManager)
> > [2022-05-21 09:07:37,543] INFO [ProducerStateManager
> > partition=auditTopic-0] Writing producer snapshot at offset 145649350
> > (kafka.log.ProducerStateManager)
> > [2022-05-21 09:07:37,544] INFO [Log partition=auditTopic-0,
> > dir=/home/tomcat/kafka_2.11-0.9.0.1/commitlogs] Rolled new log segment at
> > offset 145649350 in 2 ms. (kafka.log.Log)
> > [2022-05-21 09:07:39,207] INFO [ProducerStateManager
> > partition=taxonomyOthers-48] Writing producer snapshot at offset
> 162628223
> > (kafka.log.ProducerStateManager)
> > [2022-05-21 09:07:39,207] INFO [Log partition=taxonomyOthers-48,
> > dir=/home/tomcat/kafka_2.11-0.9.0.1/commitlogs] Rolled new log segment at
> > offset 162628223 in 2 ms. (kafka.log.Log)
> > [2022-05-21 09:07:40,103] INFO Creating /controller (is it secure? false)
> > (kafka.zk.KafkaZkClient)
> > [2022-05-21 09:07:40,115] INFO Result of znode creation at /controller
> is:
> > OK (kafka.zk.KafkaZkClient)
> > [2022-05-21 09:07:45,915] WARN Client session timed out, have not heard
> > from server in 5138ms for sessionid 0x27b85a3ead1005a
> > (org.apache.zookeeper.ClientCnxn)
> > [2022-05-21 09:07:45,916] INFO Client session timed out, have not heard
> > from server in 5138ms for sessionid 0x27b85a3ead1005a, closing socket
> > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> > [2022-05-21 09:07:45,933] WARN Attempting to send response via channel
> for
> > which there is no open connection, connection id 10.200.3.174:9092
> > -10.200.2.18:33360-5632853 (kafka.network.Processor)
> > [2022-05-21 09:07:45,936] WARN Attempting to send response via channel
> for
> > which there is no open connection, connection id 10.200.3.174:9092
> > -10.200.3.29:18284-5535253 (kafka.network.Processor)
> > [2022-05-21 09:07:45,995] WARN Attempting to send response via channel
> for
> > which there is no open connection, connection id 10.200.3.174:9092
> > -10.200.4.128:6078-5535526 (kafka.network.Processor)
> > [2022-05-21 09:07:45,996] WARN Attempting to send response via channel
> for
> > which there is no open connection, connection id 10.200.3.174:9092
> > -10.200.4.129:6664-5535480 (kafka.network.Processor)
> > [2022-05-21 09:07:46,028] INFO [ZooKeeperClient] Waiting until connected.
> > (kafka.zookeeper.ZooKeeperClient)
> > [2022-05-21 09:07:46,028] INFO [ZooKeeperClient] Connected.
> > (kafka.zookeeper.ZooKeeperClient)
> > [2022-05-21 09:07:46,902] INFO Opening socket connection to server
> 10.200/
> > 10.0.0.200:2181. Will not attempt to authenticate using SASL (unknown
> > error) (org.apache.zookeeper.ClientCnxn)
> > [2022-05-21 09:07:48,027] WARN Client session timed out, have not heard
> > from server in 2001ms for sessionid 0x27b85a3ead1005a
> > (org.apache.zookeeper.ClientCnxn)
> > [2022-05-21 09:07:48,027] INFO Client session timed out, have not heard
> > from server in 2001ms for sessionid 0x27b85a3ead1005a, closing socket
> > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> > [2022-05-21 09:07:48,129] INFO [ZooKeeperClient] Waiting until connected.
> > (kafka.zookeeper.ZooKeeperClient)
> > [2022-05-21 09:07:46,028] INFO [ZooKeeperClient] Connected.
> > (kafka.zookeeper.ZooKeeperClient)
> > [2022-05-21 09:07:46,902] INFO Opening socket connection to server
> 10.200/
> > 10.0.0.200:2181. Will not attempt to authenticate using SASL (unknown
> > error) (org.apache.zookeeper.ClientCnxn)
> > [2022-05-21 09:07:48,027] WARN Client session timed out, have not heard
> > from server in 2001ms for sessionid 0x27b85a3ead1005a
> > (org.apache.zookeeper.ClientCnxn)
> > [2022-05-21 09:07:48,027] INFO Client session timed out, have not heard
> > from server in 2001ms for sessionid 0x27b85a3ead1005a, closing socket
> > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> > [2022-05-21 09:07:48,129] INFO [ZooKeeperClient] Waiting until connected.
> > (kafka.zookeeper.ZooKeeperClient)
> > [2022-05-21 09:07:48,294] INFO Opening socket connection to server
> > 10.200.3.174/10.200.3.174:2181. Will not attempt to authenticate using
> > SASL
> > (unknown error) (org.apache.zookeeper.ClientCnxn)
> > [2022-05-21 09:07:48,294] INFO Socket connection established to
> > 10.200.3.174/10.200.3.174:2181, initiating session
> > (org.apache.zookeeper.ClientCnxn)
> > [2022-05-21 09:07:48,313] WARN Unable to reconnect to ZooKeeper service,
> > session 0x27b85a3ead1005a has expired (org.apache.zookeeper.ClientCnxn)
> > [2022-05-21 09:07:48,314] INFO Unable to reconnect to ZooKeeper service,
> > session 0x27b85a3ead1005a has expired, closing socket connection
> > (org.apache.zookeeper.ClientCnxn)
> > [2022-05-21 09:07:48,314] INFO EventThread shut down for session:
> > 0x27b85a3ead1005a (org.apache.zookeeper.ClientCnxn)
> > [2022-05-21 09:07:48,318] INFO [ZooKeeperClient] Session expired.
> > (kafka.zookeeper.ZooKeeperClient)
> > [2022-05-21 09:07:48,396] INFO [ProducerStateManager
> > partition=efGamePlay-41] Writing producer snapshot at offset 2116645612
> > (kafka.log.ProducerStateManager)
> > [2022-05-21 09:07:48,397] INFO [Log partition=efGamePlay-41,
> > dir=/home/tomcat/kafka_2.11-0.9.0.1/commitlogs] Rolled new log segment at
> > offset 2116645612 in 2 ms. (kafka.log.Log)
> > [2022-05-21 09:07:49,452] INFO [ProducerStateManager
> > partition=efAcquisition-47] Writing producer snapshot at offset 252224167
> > (kafka.log.ProducerStateManager)
> > [2022-05-21 09:07:49,453] INFO [Log partition=efAcquisition-47,
> > dir=/home/tomcat/kafka_2.11-0.9.0.1/commitlogs] Rolled new log segment at
> > offset 252224167 in 3 ms. (kafka.log.Log)
> > [2022-05-21 09:07:57,475] INFO [ProducerStateManager
> > partition=efGamePlay-5] Writing producer snapshot at offset 2116617455
> > (kafka.log.ProducerStateManager)
> > [2022-05-21 09:07:57,476] INFO [Log partition=efGamePlay-5,
> > dir=/home/tomcat/kafka_2.11-0.9.0.1/commitlogs] Rolled new log segment at
> > offset 2116617455 in 2 ms. (kafka.log.Log)
> > [2022-05-21 09:08:00,642] INFO [Partition prism_sqldb_topic-55 broker=2]
> > Shrinking ISR from 2,1 to 2 (kafka.cluster.Partition)
> > [2022-05-21 09:08:00,782] WARN Attempting to send response via channel
> for
> > which there is no open connection, connection id 10.200.3.174:9092
> > -10.200.2.192:52088-5632843 (kafka.network.Processor)
> > [2022-05-21 09:08:05,321] INFO [Log partition=efGamePlay-41,
> > dir=/home/tomcat/kafka_2.11-0.9.0.1/commitlogs] Found deletable segments
> > with base offsets [2116567157] due to retention time 3600000ms breach
> > (kafka.log.Log)
> >
>

Reply via email to