[jira] [Updated] (KAFKA-6263) Expose metric for group metadata loading duration
[ https://issues.apache.org/jira/browse/KAFKA-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anastasia Vela updated KAFKA-6263: -- Component/s: core > Expose metric for group metadata loading duration > - > > Key: KAFKA-6263 > URL: https://issues.apache.org/jira/browse/KAFKA-6263 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Jason Gustafson >Assignee: Anastasia Vela >Priority: Major > > We have seen in several cases where the log cleaner either wasn't enabled or > had experienced some failure that __consumer_offsets partitions can grow > excessively. When one of these partitions changes leadership, the new > coordinator must load the offset cache from the start of the log, which can > take arbitrarily long depending on how large the partition has grown (we have > seen cases where it took hours). Catching this problem is not always easy > because the condition is rare and the symptom just tends to be a long period > of inactivity in the consumer group which gradually gets worse over time. It > may therefore be useful to have a broker metric for the load time so that it > can be monitored and potentially alerted on. Same thing goes for the > transaction log -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-6263) Expose metric for group metadata loading duration
[ https://issues.apache.org/jira/browse/KAFKA-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anastasia Vela updated KAFKA-6263: -- Labels: (was: needs-kip) > Expose metric for group metadata loading duration > - > > Key: KAFKA-6263 > URL: https://issues.apache.org/jira/browse/KAFKA-6263 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: Anastasia Vela >Priority: Major > > We have seen in several cases where the log cleaner either wasn't enabled or > had experienced some failure that __consumer_offsets partitions can grow > excessively. When one of these partitions changes leadership, the new > coordinator must load the offset cache from the start of the log, which can > take arbitrarily long depending on how large the partition has grown (we have > seen cases where it took hours). Catching this problem is not always easy > because the condition is rare and the symptom just tends to be a long period > of inactivity in the consumer group which gradually gets worse over time. It > may therefore be useful to have a broker metric for the load time so that it > can be monitored and potentially alerted on. Same thing goes for the > transaction log -- This message was sent by Atlassian JIRA (v7.6.3#76005)