Busy loop in Kafka Client when KafkaShareConsumer fails to connect to cluster

Henrik Lüschen Thu, 15 Jan 2026 23:19:00 -0800

Hello Kafka Team,

we are observing some unexpected behavior in the Java Kafka client:


Problem description:

When a KafkaShareConsumer fails to connect to a cluster (because e.g. aport is misconfigured, ...) it enters a busy loop. The symptoms are anexcessive amount of logs, high CPU usage and a slowly increasing memoryfootprint.


Software Version:
We are using the org.apache.kafka.kafka-clients:4.1.1.

Sample:

I created a repository with a minimal sample to reproduce the behavior:https://github.com/HenrikLueschenTNG/share-consumer-busy-loop/blob/main/src/main/java/com/example/shareconsumerbusyloop/ShareConsumerBusyLoopApplication.java


Details:

When the consumer fails to establish a connection, we first see a largeamount of identical logs, often many published within the same millisecond:

2026-01-16 07:49:59.311 INFO [consumer_background_thread]org.apache.kafka.clients.Metadata - [ShareConsumerclientId=consumer-test-group-1, groupId=test-group] Rebootstrapping with[localhost/127.0.0.1:9094]2026-01-16 07:49:59.311 INFO [consumer_background_thread]org.apache.kafka.clients.Metadata - [ShareConsumerclientId=consumer-test-group-1, groupId=test-group] Rebootstrapping with[localhost/127.0.0.1:9094]2026-01-16 07:49:59.311 INFO [consumer_background_thread]org.apache.kafka.clients.Metadata - [ShareConsumerclientId=consumer-test-group-1, groupId=test-group] Rebootstrapping with[localhost/127.0.0.1:9094]

After a few seconds, the production of these logs ends, but the CPUusage remains very high.


I have done a little but of digging and found the following:

- Within the loop of the ConsumerNetworkThread, several RequestManagersare used to determine the timeout for the next poll to thenetworkClientDelegate. The CoordinatorRequestManager frequently setsthis timeout to zero. Its timeout is calculated as Math.max(0, backoffMs- timeSinceLastReceiveMs); As the backoff is, by default, between100ms-1000ms but the request timeout is 30000ms, the difference betweenthe backoff and the timeSinceLastReceived is almost always negative whenno connection can be made. I think this is causing the initial symptomof the many logs.

- After a few seconds, the client stops producing logs, but the CPUusage remains high. Additionally, a slow increase of memory usage can beobserved. I believe this is due to an accumulation of applicationEventsin the ConsumerNetworkThread. I have observed that within a few secondsseveral million such events need to be (and cannot be) processed in thecall to "processApplicationEvents". This appears to slow down the loopin the ConsumerNetworkThread, resulting in the production of fewer logs,while simultaneously keeping the CPU busy and using increasing amountsof memory.


- In the case of a classic consumer, no such behavior can be observed.


Thanks in advance on any advice on this issue!
Greetings
Henrik

Busy loop in Kafka Client when KafkaShareConsumer fails to connect to cluster

Reply via email to