[jira] [Updated] (KAFKA-6263) Expose metric for group metadata loading duration

2019-07-08 Thread Anastasia Vela (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anastasia Vela updated KAFKA-6263:
--
Component/s: core

> Expose metric for group metadata loading duration
> -
>
> Key: KAFKA-6263
> URL: https://issues.apache.org/jira/browse/KAFKA-6263
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jason Gustafson
>Assignee: Anastasia Vela
>Priority: Major
>
> We have seen in several cases where the log cleaner either wasn't enabled or 
> had experienced some failure that __consumer_offsets partitions can grow 
> excessively. When one of these partitions changes leadership, the new 
> coordinator must load the offset cache from the start of the log, which can 
> take arbitrarily long depending on how large the partition has grown (we have 
> seen cases where it took hours). Catching this problem is not always easy 
> because the condition is rare and the symptom just tends to be a long period 
> of inactivity in the consumer group which gradually gets worse over time. It 
> may therefore be useful to have a broker metric for the load time so that it 
> can be monitored and potentially alerted on. Same thing goes for the 
> transaction log 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6263) Expose metric for group metadata loading duration

2019-07-08 Thread Anastasia Vela (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anastasia Vela updated KAFKA-6263:
--
Labels:   (was: needs-kip)

> Expose metric for group metadata loading duration
> -
>
> Key: KAFKA-6263
> URL: https://issues.apache.org/jira/browse/KAFKA-6263
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Anastasia Vela
>Priority: Major
>
> We have seen in several cases where the log cleaner either wasn't enabled or 
> had experienced some failure that __consumer_offsets partitions can grow 
> excessively. When one of these partitions changes leadership, the new 
> coordinator must load the offset cache from the start of the log, which can 
> take arbitrarily long depending on how large the partition has grown (we have 
> seen cases where it took hours). Catching this problem is not always easy 
> because the condition is rare and the symptom just tends to be a long period 
> of inactivity in the consumer group which gradually gets worse over time. It 
> may therefore be useful to have a broker metric for the load time so that it 
> can be monitored and potentially alerted on. Same thing goes for the 
> transaction log 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)