I recently had a problem on my production which I believe was a manifestation 
of the issue kafka-2978 (Topic partition is not sometimes consumed after 
rebalancing of consumer group), this is fixed in 0.9.0.1 and we will upgrade 
our client soon.  However, it made me realise that I didn’t have any monitoring 
set up on this.  The only thing I can find as a metric is the 
kafka.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=([-.\w]+), 
which, if I understand correctly, is the max lag of any partition that that 
particular consumer is consuming.  
1. If I had been monitoring this, and if my consumer was suffering from the 
issue in kafka-2978, would I actually have been alerted, i.e. since the 
consumer would think it is consuming correctly would it not have updated the 
metric.
2. There is another way to see offset lag using the command 
/usr/bin/kafka-consumer-groups --new-consumer --bootstrap-server 
10.10.1.61:9092 --describe —group consumer_group_name and parsing the response. 
 Is it safe or advisable to do this?  I like the fact that it tells me each 
partition lag, although it is also not available if no consumer from the group 
is currently consuming.
3. Is there a better way of doing this?

Reply via email to