Here's another strange bug that we're seeing after upgrading to Kafka
0.10.1.0: one of our consumer groups is appearing twice in the list, and
appears to belong to two different nodes.
% kafka-consumer-groups.sh --bootstrap-server localhost:40172 --list | sort
| uniq -c | sort -n | grep -v '^ *1'
2 details-log-etl
If I manually send a ListGroups request to each node, the offending
consumer group shows up twice (once as owned by broker ID 1 and once as
owned by broker ID 2). If I manually send an OffsetFetchRequest to Broker
#1 and Broker #2 with the given group name, I get back conflicting
responses:
(from Broker #1):
OffsetFetchResponse_v1(topics=[(topic='tracking.details',
partitions=[(partition=0, offset=85606947, metadata='', error_code=0)])])
(from Broker #2):
OffsetFetchResponse_v1(topics=[(topic='tracking.details',
partitions=[(partition=0, offset=83718751, metadata='', error_code=0)])])
The offset=85606947 response is correct.
If I use the GroupCoordinatorRequest API, both broker 1 and broker2 return
a result that broker 1 is the coordinator. The actual consuming application
seems unaffected and is proceeding as expected using broker 1.
This isn't actually breaking anything critical (since, like I said, actual
consumers seem to be doing the right thing), but it's breaking monitoring,
and it concerns me that such a duplicate is possible.
I haven't tried bouncing the consumer yet to see if that fixes it; I
figured I'd e-mail out just in case there was anything else folks wanted me
to look at first.
--
James Brown
Engineer