Dear kafka users,

I run a kafka cluster (version 2.1.1) with 6 brokers to process ~100 messages 
per second with a number of kafka streams apps. There are currently 53 topics 
with 30 partitions each. I have exactly once processing enabled. My problem is 
that the __consumer_offsets topic is growing indefinitely. __consumer_offsets 
topic has the default 50 partitions although some of them are not filled yet. 
Commit interval is also at its default value of 100 ms for exactly once stream 
processing. All the streams apps are running and processing data continuously.
I therefore did some digging:

1.) I'm not hitting https://issues.apache.org/jira/browse/KAFKA-8335. I checked 
the content of the __consumer_offsets topic. The last messages retained are 
from a week ago and  do not seem to be solely related to transactions.

2.) The LogCleaner thread on all brokers is running. According to its logs 
__consumer_offsets partitions are cleaned up very rarely. E. g. on one of the 
brokers there are log entries for one partition a day. In the last 7 days on 3 
days no cleaning for __consumer_offsets happened at all on this broker even 
though it holds 25 partitions of __consumer_offsets topic.

3.) Even when the rare occasion of cleaning a __consumer_offsets partition 
happens that just reduced the size of the partition from ~ 16 GB to 13 GB. I 
would have expected more cleaning.

4.) The previous to points led to a total of ~ 750 GB disc space being occupied 
by the __consumer_offsets topic. This considerably slows down broker startup 
and so on.

5.) The cleanup of __transaction_state topic does seem to work smoothly. There, 
each partition is cleaned ~ once per hour and therefore does not grow above ~ 
100 MB. In total __transaction_state topic occupies ~8 GB of diskspace.

6.) Other topics occupy ~3 GB of disk space. They, too, get cleaned up 
regularly, no matter the cleanup policy.

So, the Questions are:

a) For some streams apps I have a lot of instances running. Does this impact 
the number of messages in __consumer_offsets topic? From my understanding that 
should not make a difference because the offsets are stored per consumer group 
and partition. Is this correct?

b) How can I assure that the LogCleaner regularly cleans up __consumer_offsets 
partitions? What is special about this topic in regard to cleanup?

c) I set the segment.bytes for __consumer_offsets topic to 1 GB. Is the 
LogCleaner working more efficiently for a lot of smaller files?

Thanks for your help.

Best, Claudia

Reply via email to