I'm trying to track down an issue with one of our consumers. There are 4
threads in the same consumer group, which will run happily for a few hours
before one of them crashes with the following exception:

org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be
completed due to group rebalance

This consumer is not using autocommits, instead managing its own
committing. The consumer, as well as the broker, are 0.9.0.1.

>From what I've read in other mailing list posts as well as the
documentation, this seems to indicate that this consumer thread did not
send a heartbeat within session.timeout.ms and was kicked out of the group
by the coordinator.

I added some logging to check on this, and the logging indicates that
poll() is called on the consumer much more often than the session.timeout.ms
time (configured to 30,000ms, heartbeat.interval.ms = 1000). poll() is
called within a second or less, and in general with this consumer poll() is
called 2-3x a second on average.

In addition to the exception, the following two messages are also logged
right before the crash:

Marking the coordinator 2147483644 dead.
Error UNKNOWN_MEMBER_ID occurred while committing offsets for group
<group_name>

This also seems to indicate that the consumer exceeded the
session.timeout.ms value, but again poll() seems to be being called enough.

Any idea what could be happening? Happy to provide more details or config
to help diagnose the issue.

Reply via email to