Hi team,

We have a Kafka stream application running with Kafka clients 3.8.1. We met a 
strange issue and has no clue to find the root cause at this moment , please 
help.


The issue was found because one of the partition lag is increasing, then we 
checked the stream state, found one node has state stuck in rebalancing.


Then we checked logs. Only 2 logs found:


2025-11-12T01:38:52.315+0800|WARN|kafka-coordinator-heartbeat-thread|Stream-xxxx|o.a.k.c.c.i.ConsumerCoordinator.handlerPollTimeoutExpiry[AbstractCoordinator.java:1147]|[Consumer
 clientId=Stream-xxxx-StreamThread-11-consumer, groupId=Stream-xxxx] consumer 
poll timeout has expired. This means the time between subsequent calls to 
poll() was longer than the configured max.poll.interval.ms or by reducing the 
maximum size of batches returned in poll() with max.poll.records.


2025-11-12T01:39:01.382+0800|ERROR|Stream-xxxx-StreamThread-2|o.a.k.s.p.internals.StreamTask.closeStateManager[StateManagerUtil.java:149]|stream-thread
 [Stream-xxxx-StreamThread-2] task [1_11] Failed to acquire lock while closing 
the state store for Active task 1_11



I'm not sure if above error logs are related to the issue, but 
(1) the log time is almost same as the time when we see the partition lag start 
increasing 
(2) the lag increasing partition is 11, same as the log mentioned task 1_11


I have also tried dig existing JIRA issues to see if this is an known issue, it 
looks a lot like
(1) KAFKA-16025: but this one should already fixed in 3.8.1?
(2) KAFKA-18355: but this bug said the new thread keep throwing the lock 
exception, I only have one line error log related to the lock.


It seems like: the client met issue and try to change state from active to 
rebalancing, but it failed before reach the request leave consumer group part. 
As a result, no rebalancing happen, and no real consumer is processing the 
partition data... 


The issue happened on 3 different setups already, but unfortunately all of them 
are running production  environments, not much debug information I can get for 
now :(


Looking forward to your reply.
Thanks
- Chen



Reply via email to