[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics
[ https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481810#comment-16481810 ] Dong Lin commented on KAFKA-3473: - Sure. Opened [https://github.com/apache/kafka/pull/5043] to add docs. Thanks for the reference. > KIP-237: More Controller Health Metrics > --- > > Key: KAFKA-3473 > URL: https://issues.apache.org/jira/browse/KAFKA-3473 > Project: Kafka > Issue Type: Improvement > Components: controller >Affects Versions: 1.0.1 >Reporter: Jiangjie Qin >Assignee: Dong Lin >Priority: Major > Fix For: 2.0.0 > > > Currently controller appends the requests to brokers into controller channel > manager queue during state transition. i.e. the state transition are > propagated asynchronously. We need to track the request queue time on the > controller side to see how long the state propagation is delayed after the > state transition finished on the controller. > We also want to have metrics to monitor the ControllerEventManager queue size > and the average time it takes for a event to wait in this queue before being > processed. > See > https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics > for more detail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics
[ https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481754#comment-16481754 ] James Cheng commented on KAFKA-3473: I was thinking of the [http://kafka.apache.org/documentation/#monitoring] section of the kafka site. It has a list of the JMX metrics that are available on the brokers. We can add the new metrics to that list. I think the file to edit is https://github.com/apache/kafka/blob/trunk/docs/ops.html > KIP-237: More Controller Health Metrics > --- > > Key: KAFKA-3473 > URL: https://issues.apache.org/jira/browse/KAFKA-3473 > Project: Kafka > Issue Type: Improvement > Components: controller >Affects Versions: 1.0.1 >Reporter: Jiangjie Qin >Assignee: Dong Lin >Priority: Major > Fix For: 2.0.0 > > > Currently controller appends the requests to brokers into controller channel > manager queue during state transition. i.e. the state transition are > propagated asynchronously. We need to track the request queue time on the > controller side to see how long the state propagation is delayed after the > state transition finished on the controller. > We also want to have metrics to monitor the ControllerEventManager queue size > and the average time it takes for a event to wait in this queue before being > processed. > See > https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics > for more detail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics
[ https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481742#comment-16481742 ] Dong Lin commented on KAFKA-3473: - [~wushujames] Could you tell me which docs you are referring to? > KIP-237: More Controller Health Metrics > --- > > Key: KAFKA-3473 > URL: https://issues.apache.org/jira/browse/KAFKA-3473 > Project: Kafka > Issue Type: Improvement > Components: controller >Affects Versions: 1.0.1 >Reporter: Jiangjie Qin >Assignee: Dong Lin >Priority: Major > Fix For: 2.0.0 > > > Currently controller appends the requests to brokers into controller channel > manager queue during state transition. i.e. the state transition are > propagated asynchronously. We need to track the request queue time on the > controller side to see how long the state propagation is delayed after the > state transition finished on the controller. > We also want to have metrics to monitor the ControllerEventManager queue size > and the average time it takes for a event to wait in this queue before being > processed. > See > https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics > for more detail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics
[ https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481489#comment-16481489 ] James Cheng commented on KAFKA-3473: [~lindong], can you update the docs to include this new metric? > KIP-237: More Controller Health Metrics > --- > > Key: KAFKA-3473 > URL: https://issues.apache.org/jira/browse/KAFKA-3473 > Project: Kafka > Issue Type: Improvement > Components: controller >Affects Versions: 1.0.1 >Reporter: Jiangjie Qin >Assignee: Dong Lin >Priority: Major > Fix For: 2.0.0 > > > Currently controller appends the requests to brokers into controller channel > manager queue during state transition. i.e. the state transition are > propagated asynchronously. We need to track the request queue time on the > controller side to see how long the state propagation is delayed after the > state transition finished on the controller. > We also want to have metrics to monitor the ControllerEventManager queue size > and the average time it takes for a event to wait in this queue before being > processed. > See > https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics > for more detail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics
[ https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475316#comment-16475316 ] ASF GitHub Bot commented on KAFKA-3473: --- lindong28 closed pull request #4392: KAFKA-3473; More Controller Health Metrics (KIP-237) URL: https://github.com/apache/kafka/pull/4392 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/core/src/main/scala/kafka/controller/ControllerChannelManager.scala b/core/src/main/scala/kafka/controller/ControllerChannelManager.scala index aab4de23606..addd88df3f0 100755 --- a/core/src/main/scala/kafka/controller/ControllerChannelManager.scala +++ b/core/src/main/scala/kafka/controller/ControllerChannelManager.scala @@ -19,7 +19,7 @@ package kafka.controller import java.net.SocketTimeoutException import java.util.concurrent.{BlockingQueue, LinkedBlockingQueue, TimeUnit} -import com.yammer.metrics.core.Gauge +import com.yammer.metrics.core.{Gauge, Timer} import kafka.api._ import kafka.cluster.Broker import kafka.common.KafkaException @@ -44,6 +44,7 @@ import scala.collection.{Set, mutable} object ControllerChannelManager { val QueueSizeMetricName = "QueueSize" + val RequestRateAndQueueTimeMetricName = "RequestRateAndQueueTimeMs" } class ControllerChannelManager(controllerContext: ControllerContext, config: KafkaConfig, time: Time, metrics: Metrics, @@ -82,7 +83,7 @@ class ControllerChannelManager(controllerContext: ControllerContext, config: Kaf val stateInfoOpt = brokerStateInfo.get(brokerId) stateInfoOpt match { case Some(stateInfo) => - stateInfo.messageQueue.put(QueueItem(apiKey, request, callback)) + stateInfo.messageQueue.put(QueueItem(apiKey, request, callback, time.milliseconds())) case None => warn(s"Not sending request $request to broker $brokerId, since it is offline.") } @@ -151,8 +152,12 @@ class ControllerChannelManager(controllerContext: ControllerContext, config: Kaf case Some(name) => s"$name:Controller-${config.brokerId}-to-broker-${broker.id}-send-thread" } +val requestRateAndQueueTimeMetrics = newTimer( + RequestRateAndQueueTimeMetricName, TimeUnit.MILLISECONDS, TimeUnit.SECONDS, brokerMetricTags(broker.id) +) + val requestThread = new RequestSendThread(config.brokerId, controllerContext, messageQueue, networkClient, - brokerNode, config, time, stateChangeLogger, threadName) + brokerNode, config, time, requestRateAndQueueTimeMetrics, stateChangeLogger, threadName) requestThread.setDaemon(false) val queueSizeGauge = newGauge( @@ -160,14 +165,14 @@ class ControllerChannelManager(controllerContext: ControllerContext, config: Kaf new Gauge[Int] { def value: Int = messageQueue.size }, - queueSizeTags(broker.id) + brokerMetricTags(broker.id) ) -brokerStateInfo.put(broker.id, new ControllerBrokerStateInfo(networkClient, brokerNode, messageQueue, - requestThread, queueSizeGauge)) +brokerStateInfo.put(broker.id, ControllerBrokerStateInfo(networkClient, brokerNode, messageQueue, + requestThread, queueSizeGauge, requestRateAndQueueTimeMetrics)) } - private def queueSizeTags(brokerId: Int) = Map("broker-id" -> brokerId.toString) + private def brokerMetricTags(brokerId: Int) = Map("broker-id" -> brokerId.toString) private def removeExistingBroker(brokerState: ControllerBrokerStateInfo) { try { @@ -178,7 +183,8 @@ class ControllerChannelManager(controllerContext: ControllerContext, config: Kaf brokerState.requestSendThread.shutdown() brokerState.networkClient.close() brokerState.messageQueue.clear() - removeMetric(QueueSizeMetricName, queueSizeTags(brokerState.brokerNode.id)) + removeMetric(QueueSizeMetricName, brokerMetricTags(brokerState.brokerNode.id)) + removeMetric(RequestRateAndQueueTimeMetricName, brokerMetricTags(brokerState.brokerNode.id)) brokerStateInfo.remove(brokerState.brokerNode.id) } catch { case e: Throwable => error("Error while removing broker by the controller", e) @@ -193,7 +199,7 @@ class ControllerChannelManager(controllerContext: ControllerContext, config: Kaf } case class QueueItem(apiKey: ApiKeys, request: AbstractRequest.Builder[_ <: AbstractRequest], - callback: AbstractResponse => Unit) + callback: AbstractResponse => Unit, enqueueTimeMs: Long) class RequestSendThread(val controllerId: Int, val controllerContext: ControllerContext, @@ -202,6 +208,7 @@ class RequestSendThread(val controllerId: Int, val brokerNode: Node,
[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics
[ https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348441#comment-16348441 ] Damian Guy commented on KAFKA-3473: --- [~lindong] [~ijuma] what is the status of the PR? Should this be in 1.1? > KIP-237: More Controller Health Metrics > --- > > Key: KAFKA-3473 > URL: https://issues.apache.org/jira/browse/KAFKA-3473 > Project: Kafka > Issue Type: Improvement > Components: controller >Affects Versions: 1.0.1 >Reporter: Jiangjie Qin >Assignee: Dong Lin >Priority: Major > Fix For: 1.1.0 > > > Currently controller appends the requests to brokers into controller channel > manager queue during state transition. i.e. the state transition are > propagated asynchronously. We need to track the request queue time on the > controller side to see how long the state propagation is delayed after the > state transition finished on the controller. > We also want to have metrics to monitor the ControllerEventManager queue size > and the average time it takes for a event to wait in this queue before being > processed. > See > https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics > for more detail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)