Hello All, We are facing one issue in the 3.4.1 kafka version cluster. We have implemented 3 brokers cluster on a single node server in the kubernetes environment. Whenever one of the broker goes offline then we are facing below issue. So, please help us to resolve these issue. Thanks to all in advance.
*Issue we faced on client :* 11:00:01.285 ERROR org.apache.kafka.streams.processor.internals.TaskExecutor - stream-thread [dev-org-clz-com-v3.0.108-ROLE_CLZ_COM_RETRIEVE-DEV_123-691e64a9-9ca2-4be6-bc92-dec50e33dbbe-StreamThread-4] Committing task(s) 0_7 failed. org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing offsets {ROLE_CLZ_COM_RETRIEVE-DEV_123-7=OffsetAndMetadata{offset=7, leaderEpoch=null, metadata='AgAAAYrgldR5'}} 11:04:31.407 ERROR org.apache.kafka.streams.KafkaStreams - stream-client [dev-org-clz-com-v3.0.108-ROLE_CLZ_COM_RETRIEVE-DEV_123-691e64a9-9ca2-4be6-bc92-dec50e33dbbe] Replacing thread in the streams uncaught exception handler org.apache.kafka.streams.errors.StreamsException: org.apache.kafka.common.errors.TimeoutException: Task 0_7 did not make progress within 360100 ms. Adjust `task.timeout.ms` if needed. at org.apache.kafka.streams.processor.internals.AbstractTask.maybeInitTaskTimeoutOrThrow(AbstractTask.java:181) ~[kafka-streams-3.4.1.jar:?] at org.apache.kafka.streams.processor.internals.TaskManager.lambda$commit$18(TaskManager.java:1611) ~[kafka-streams-3.4.1.jar:?] at java.util.HashMap$KeySet.forEach(HashMap.java:933) ~[?:1.8.0_171] at org.apache.kafka.streams.processor.internals.TaskManager.commit(TaskManager.java:1611) ~[kafka-streams-3.4.1.jar:?] at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:1109) ~[kafka-streams-3.4.1.jar:?] at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:829) ~[kafka-streams-3.4.1.jar:?] at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:613) ~[kafka-streams-3.4.1.jar:?] at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:575) [kafka-streams-3.4.1.jar:?] Caused by: org.apache.kafka.common.errors.TimeoutException: Task 0_7 did not make progress within 360100 ms. Adjust `task.timeout.ms` if needed. Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing offsets {ROLE_CLZ_COM_RETRIEVE-DEV_123-7=OffsetAndMetadata{offset=7, leaderEpoch=null, metadata='AgAAAYrgldR5'}} *Logs of broker :* [2023-10-03 10:45:18,474] INFO [RaftManager nodeId=0] Node 2 disconnected. (org.apache.kafka.clients.NetworkClient) [2023-10-03 10:46:35,920] INFO [GroupCoordinator 0]: Preparing to rebalance group UnitTest-producer-client-1696329858450 in state PreparingRebalance with old generation 1 (__consumer_offsets-42) (reason: Removing member consumer-UnitTest-producer-client-1696329858450-1-a3c294ac-cab9-40f6-a76e-548ffff6ae8b on LeaveGroup; client reason: the consumer is being closed) (kafka.coordinator.group.GroupCoordinator) [2023-10-03 10:46:35,920] INFO [GroupCoordinator 0]: Group UnitTest-producer-client-1696329858450 with generation 2 is now empty (__consumer_offsets-42) (kafka.coordinator.group.GroupCoordinator) [2023-10-03 10:58:02,129] INFO [GroupCoordinator 0]: Member dev-atd-clz-com-v3.0.8-ATD_CLZ_COM-DEV_123-303bb001-295e-4536-b392-a6256ef22550-StreamThread-3-consumer-f753ecbf-6b5d-40dd-bd57-bbd3ff74628b in group dev-atd-clz-com-v3.0.8-ATD_CLZ_COM-DEV_123 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [2023-10-03 10:58:02,129] INFO [GroupCoordinator 0]: Group dev-atd-clz-com-v3.0.8-ATD_CLZ_COM-DEV_123 with generation 2 is now empty (__consumer_offsets-12) (kafka.coordinator.group.GroupCoordinator) [2023-10-03 11:00:41,291] INFO [GroupMetadataManager brokerId=0] Group dev-atd-clz-com-v3.0.8-ATD_CLZ_COM-DEV_123 transitioned to Dead in generation 2 (kafka.coordinator.group.GroupMetadataManager) [2023-10-03 11:01:37,627] INFO [RaftManager nodeId=0] Become candidate due to fetch timeout (org.apache.kafka.raft.KafkaRaftClient) [2023-10-03 11:01:37,826] INFO [RaftManager nodeId=0] Completed transition to CandidateState(localId=0, epoch=1767, retries=1, electionTimeoutMs=1906) (org.apache.kafka.raft.QuorumState) [2023-10-03 11:01:37,960] INFO [RaftManager nodeId=0] Insufficient remaining votes to become leader (rejected by [1, 2]). We will backoff before retrying election again (org.apache.kafka.raft.KafkaRaftClient) [2023-10-03 11:01:37,960] INFO [RaftManager nodeId=0] Re-elect as candidate after election backoff has completed (org.apache.kafka.raft.KafkaRaftClient) [2023-10-03 11:01:37,982] INFO [RaftManager nodeId=0] Completed transition to CandidateState(localId=0, epoch=1768, retries=2, electionTimeoutMs=1547) (org.apache.kafka.raft.QuorumState) [2023-10-03 11:01:38,022] INFO [RaftManager nodeId=0] Insufficient remaining votes to become leader (rejected by [1, 2]). We will backoff before retrying election again (org.apache.kafka.raft.KafkaRaftClient) [2023-10-03 11:01:38,022] INFO [RaftManager nodeId=0] Re-elect as candidate after election backoff has completed (org.apache.kafka.raft.KafkaRaftClient) [2023-10-03 11:01:38,081] INFO [RaftManager nodeId=0] Completed transition to CandidateState(localId=0, epoch=1769, retries=3, electionTimeoutMs=1846) (org.apache.kafka.raft.QuorumState) [2023-10-03 11:01:38,109] INFO [RaftManager nodeId=0] Insufficient remaining votes to become leader (rejected by [1, 2]). We will backoff before retrying election again (org.apache.kafka.raft.KafkaRaftClient) [2023-10-03 11:01:38,409] INFO [RaftManager nodeId=0] Re-elect as candidate after election backoff has completed (org.apache.kafka.raft.KafkaRaftClient) [2023-10-03 11:01:38,413] INFO [RaftManager nodeId=0] Completed transition to CandidateState(localId=0, epoch=1770, retries=4, electionTimeoutMs=1502) (org.apache.kafka.raft.QuorumState) [2023-10-03 11:01:38,434] INFO [RaftManager nodeId=0] Insufficient remaining votes to become leader (rejected by [1, 2]). We will backoff before retrying election again (org.apache.kafka.raft.KafkaRaftClient) [2023-10-03 11:01:38,434] INFO [RaftManager nodeId=0] Re-elect as candidate after election backoff has completed (org.apache.kafka.raft.KafkaRaftClient) [2023-10-03 11:01:38,437] INFO [RaftManager nodeId=0] Completed transition to CandidateState(localId=0, epoch=1771, retries=5, electionTimeoutMs=1478) (org.apache.kafka.raft.QuorumState) [2023-10-03 11:01:38,458] INFO [RaftManager nodeId=0] Insufficient remaining votes to become leader (rejected by [1, 2]). We will backoff before retrying election again (org.apache.kafka.raft.KafkaRaftClient) [2023-10-03 11:01:38,897] INFO [BrokerToControllerChannelManager broker=0 name=heartbeat] Client requested disconnect from node 1 (org.apache.kafka.clients.NetworkClient) [2023-10-03 11:01:39,018] INFO [RaftManager nodeId=0] Completed transition to Unattached(epoch=1772, voters=[0, 1, 2], electionTimeoutMs=897) (org.apache.kafka.raft.QuorumState) [2023-10-03 11:01:39,021] INFO [RaftManager nodeId=0] Completed transition to Voted(epoch=1772, votedId=2, voters=[0, 1, 2], electionTimeoutMs=1494) (org.apache.kafka.raft.QuorumState) [2023-10-03 11:01:39,021] INFO [RaftManager nodeId=0] Vote request VoteRequestData(clusterId='EP6hyiddQNW5FPrAvR9kWw', topics=[TopicData(topicName='__cluster_metadata', partitions=[PartitionData(partitionIndex=0, candidateEpoch=1772, candidateId=2, lastOffsetEpoch=1766, lastOffset=772087)])]) with epoch 1772 is granted (org.apache.kafka.raft.KafkaRaftClient) [2023-10-03 11:01:39,026] INFO [RaftManager nodeId=0] Completed transition to FollowerState(fetchTimeoutMs=2000, epoch=1772, leaderId=2, voters=[0, 1, 2], highWatermark=Optional[LogOffsetMetadata(offset=772080, metadata=Optional.empty)], fetchingSnapshot=Optional.empty) (org.apache.kafka.raft.QuorumState) [2023-10-03 11:01:39,099] INFO [BrokerToControllerChannelManager broker=0 name=heartbeat]: Recorded new controller, from now on will use node kafka-2-0.kafka.dev.svc.cluster.local:9093 (id: 2 rack: null) (kafka.server.BrokerToControllerRequestThread) [2023-10-03 11:02:30,122] INFO [GroupCoordinator 0]: Dynamic member with unknown member id joins group UnitTest-producer-client-1696330940792 in Empty state. Created a new member id consumer-UnitTest-produceproducer-client-1696330940792-1-211be5e3-8bbf-436b-83e8-83900e03befa and request the member to rejoin with this id. (kafka.coordinator.group.GroupCoordinator) --- Thanks & Regards, Kunal Jadhav