Hello everyone,

We’re experiencing an issue where Kafka clients are significantly delayed in 
rediscovering the GroupCoordinator after the broker originally assigned as the 
GroupCoordinator becomes unreachable.

In this scenario, while most clients are able to quickly locate a new 
GroupCoordinator using the FindCoordinator protocol, a few clients are taking 
as long as max.poll.interval.ms to do so.
This delay in rediscovery is causing the group rebalance to be postponed, 
leading to a prolonged interruption in message consumption.

Our Kafka server version is 2.3.1, but the clients are using version 1.1.1.

We observed that after the client logs the message:

----
Group coordinator ... is unavailable or invalid, will attempt rediscovery
---

it takes about 5 minutes before we see:

---
Discovered group coordinator ...
---

Unfortunately, due to the older client version (1.1.1), we lack more detailed 
logs for further insight.

Has anyone experienced a similar delay in coordinator rediscovery on some Kafka 
clients?
Would reducing max.poll.interval.ms help by causing these delayed clients to be 
removed from the group more quickly, potentially speeding up the rebalance 
process?

I’ve checked KAFKA-9752[1], but since there is no log like “Pending member 
$memberId in group {groupId} has been removed after session timeout 
expiration,” I’m not sure if this issue is related.

Any insight or suggestions would be appreciated.

Best regards,
Minwoo Kang.

[1]: https://issues.apache.org/jira/browse/KAFKA-9752

Reply via email to