[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics
[ https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481810#comment-16481810 ] Dong Lin commented on KAFKA-3473: - Sure. Opened [https://github.com/apache/kafka/pull/5043] to add docs. Thanks for the reference. > KIP-237: More Controller Health Metrics > --- > > Key: KAFKA-3473 > URL: https://issues.apache.org/jira/browse/KAFKA-3473 > Project: Kafka > Issue Type: Improvement > Components: controller >Affects Versions: 1.0.1 >Reporter: Jiangjie Qin >Assignee: Dong Lin >Priority: Major > Fix For: 2.0.0 > > > Currently controller appends the requests to brokers into controller channel > manager queue during state transition. i.e. the state transition are > propagated asynchronously. We need to track the request queue time on the > controller side to see how long the state propagation is delayed after the > state transition finished on the controller. > We also want to have metrics to monitor the ControllerEventManager queue size > and the average time it takes for a event to wait in this queue before being > processed. > See > https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics > for more detail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics
[ https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481754#comment-16481754 ] James Cheng commented on KAFKA-3473: I was thinking of the [http://kafka.apache.org/documentation/#monitoring] section of the kafka site. It has a list of the JMX metrics that are available on the brokers. We can add the new metrics to that list. I think the file to edit is https://github.com/apache/kafka/blob/trunk/docs/ops.html > KIP-237: More Controller Health Metrics > --- > > Key: KAFKA-3473 > URL: https://issues.apache.org/jira/browse/KAFKA-3473 > Project: Kafka > Issue Type: Improvement > Components: controller >Affects Versions: 1.0.1 >Reporter: Jiangjie Qin >Assignee: Dong Lin >Priority: Major > Fix For: 2.0.0 > > > Currently controller appends the requests to brokers into controller channel > manager queue during state transition. i.e. the state transition are > propagated asynchronously. We need to track the request queue time on the > controller side to see how long the state propagation is delayed after the > state transition finished on the controller. > We also want to have metrics to monitor the ControllerEventManager queue size > and the average time it takes for a event to wait in this queue before being > processed. > See > https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics > for more detail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics
[ https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481742#comment-16481742 ] Dong Lin commented on KAFKA-3473: - [~wushujames] Could you tell me which docs you are referring to? > KIP-237: More Controller Health Metrics > --- > > Key: KAFKA-3473 > URL: https://issues.apache.org/jira/browse/KAFKA-3473 > Project: Kafka > Issue Type: Improvement > Components: controller >Affects Versions: 1.0.1 >Reporter: Jiangjie Qin >Assignee: Dong Lin >Priority: Major > Fix For: 2.0.0 > > > Currently controller appends the requests to brokers into controller channel > manager queue during state transition. i.e. the state transition are > propagated asynchronously. We need to track the request queue time on the > controller side to see how long the state propagation is delayed after the > state transition finished on the controller. > We also want to have metrics to monitor the ControllerEventManager queue size > and the average time it takes for a event to wait in this queue before being > processed. > See > https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics > for more detail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics
[ https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481489#comment-16481489 ] James Cheng commented on KAFKA-3473: [~lindong], can you update the docs to include this new metric? > KIP-237: More Controller Health Metrics > --- > > Key: KAFKA-3473 > URL: https://issues.apache.org/jira/browse/KAFKA-3473 > Project: Kafka > Issue Type: Improvement > Components: controller >Affects Versions: 1.0.1 >Reporter: Jiangjie Qin >Assignee: Dong Lin >Priority: Major > Fix For: 2.0.0 > > > Currently controller appends the requests to brokers into controller channel > manager queue during state transition. i.e. the state transition are > propagated asynchronously. We need to track the request queue time on the > controller side to see how long the state propagation is delayed after the > state transition finished on the controller. > We also want to have metrics to monitor the ControllerEventManager queue size > and the average time it takes for a event to wait in this queue before being > processed. > See > https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics > for more detail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics
[ https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475316#comment-16475316 ] ASF GitHub Bot commented on KAFKA-3473: --- lindong28 closed pull request #4392: KAFKA-3473; More Controller Health Metrics (KIP-237) URL: https://github.com/apache/kafka/pull/4392 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/core/src/main/scala/kafka/controller/ControllerChannelManager.scala b/core/src/main/scala/kafka/controller/ControllerChannelManager.scala index aab4de23606..addd88df3f0 100755 --- a/core/src/main/scala/kafka/controller/ControllerChannelManager.scala +++ b/core/src/main/scala/kafka/controller/ControllerChannelManager.scala @@ -19,7 +19,7 @@ package kafka.controller import java.net.SocketTimeoutException import java.util.concurrent.{BlockingQueue, LinkedBlockingQueue, TimeUnit} -import com.yammer.metrics.core.Gauge +import com.yammer.metrics.core.{Gauge, Timer} import kafka.api._ import kafka.cluster.Broker import kafka.common.KafkaException @@ -44,6 +44,7 @@ import scala.collection.{Set, mutable} object ControllerChannelManager { val QueueSizeMetricName = "QueueSize" + val RequestRateAndQueueTimeMetricName = "RequestRateAndQueueTimeMs" } class ControllerChannelManager(controllerContext: ControllerContext, config: KafkaConfig, time: Time, metrics: Metrics, @@ -82,7 +83,7 @@ class ControllerChannelManager(controllerContext: ControllerContext, config: Kaf val stateInfoOpt = brokerStateInfo.get(brokerId) stateInfoOpt match { case Some(stateInfo) => - stateInfo.messageQueue.put(QueueItem(apiKey, request, callback)) + stateInfo.messageQueue.put(QueueItem(apiKey, request, callback, time.milliseconds())) case None => warn(s"Not sending request $request to broker $brokerId, since it is offline.") } @@ -151,8 +152,12 @@ class ControllerChannelManager(controllerContext: ControllerContext, config: Kaf case Some(name) => s"$name:Controller-${config.brokerId}-to-broker-${broker.id}-send-thread" } +val requestRateAndQueueTimeMetrics = newTimer( + RequestRateAndQueueTimeMetricName, TimeUnit.MILLISECONDS, TimeUnit.SECONDS, brokerMetricTags(broker.id) +) + val requestThread = new RequestSendThread(config.brokerId, controllerContext, messageQueue, networkClient, - brokerNode, config, time, stateChangeLogger, threadName) + brokerNode, config, time, requestRateAndQueueTimeMetrics, stateChangeLogger, threadName) requestThread.setDaemon(false) val queueSizeGauge = newGauge( @@ -160,14 +165,14 @@ class ControllerChannelManager(controllerContext: ControllerContext, config: Kaf new Gauge[Int] { def value: Int = messageQueue.size }, - queueSizeTags(broker.id) + brokerMetricTags(broker.id) ) -brokerStateInfo.put(broker.id, new ControllerBrokerStateInfo(networkClient, brokerNode, messageQueue, - requestThread, queueSizeGauge)) +brokerStateInfo.put(broker.id, ControllerBrokerStateInfo(networkClient, brokerNode, messageQueue, + requestThread, queueSizeGauge, requestRateAndQueueTimeMetrics)) } - private def queueSizeTags(brokerId: Int) = Map("broker-id" -> brokerId.toString) + private def brokerMetricTags(brokerId: Int) = Map("broker-id" -> brokerId.toString) private def removeExistingBroker(brokerState: ControllerBrokerStateInfo) { try { @@ -178,7 +183,8 @@ class ControllerChannelManager(controllerContext: ControllerContext, config: Kaf brokerState.requestSendThread.shutdown() brokerState.networkClient.close() brokerState.messageQueue.clear() - removeMetric(QueueSizeMetricName, queueSizeTags(brokerState.brokerNode.id)) + removeMetric(QueueSizeMetricName, brokerMetricTags(brokerState.brokerNode.id)) + removeMetric(RequestRateAndQueueTimeMetricName, brokerMetricTags(brokerState.brokerNode.id)) brokerStateInfo.remove(brokerState.brokerNode.id) } catch { case e: Throwable => error("Error while removing broker by the controller", e) @@ -193,7 +199,7 @@ class ControllerChannelManager(controllerContext: ControllerContext, config: Kaf } case class QueueItem(apiKey: ApiKeys, request: AbstractRequest.Builder[_ <: AbstractRequest], - callback: AbstractResponse => Unit) + callback: AbstractResponse => Unit, enqueueTimeMs: Long) class RequestSendThread(val controllerId: Int, val controllerContext: ControllerContext, @@ -202,6 +208,7 @@ class RequestSendThread(val controllerId: Int, val brokerNode:
[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics
[ https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348441#comment-16348441 ] Damian Guy commented on KAFKA-3473: --- [~lindong] [~ijuma] what is the status of the PR? Should this be in 1.1? > KIP-237: More Controller Health Metrics > --- > > Key: KAFKA-3473 > URL: https://issues.apache.org/jira/browse/KAFKA-3473 > Project: Kafka > Issue Type: Improvement > Components: controller >Affects Versions: 1.0.1 >Reporter: Jiangjie Qin >Assignee: Dong Lin >Priority: Major > Fix For: 1.1.0 > > > Currently controller appends the requests to brokers into controller channel > manager queue during state transition. i.e. the state transition are > propagated asynchronously. We need to track the request queue time on the > controller side to see how long the state propagation is delayed after the > state transition finished on the controller. > We also want to have metrics to monitor the ControllerEventManager queue size > and the average time it takes for a event to wait in this queue before being > processed. > See > https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics > for more detail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics
[ https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311911#comment-16311911 ] ASF GitHub Bot commented on KAFKA-3473: --- lindong28 opened a new pull request #4392: KAFKA-3473; More Controller Health Metrics (KIP-237) URL: https://github.com/apache/kafka/pull/4392 This patch adds a few metrics that are useful for monitoring controller health. See KIP-237 for more detail. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > KIP-237: More Controller Health Metrics > --- > > Key: KAFKA-3473 > URL: https://issues.apache.org/jira/browse/KAFKA-3473 > Project: Kafka > Issue Type: Improvement > Components: controller >Affects Versions: 1.0.1 >Reporter: Jiangjie Qin >Assignee: Dong Lin > Fix For: 1.1.0 > > > Currently controller appends the requests to brokers into controller channel > manager queue during state transition. i.e. the state transition are > propagated asynchronously. We need to track the request queue time on the > controller side to see how long the state propagation is delayed after the > state transition finished on the controller. > We also want to have metrics to monitor the ControllerEventManager queue size > and the average time it takes for a event to wait in this queue before being > processed. > See > https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics > for more detail. -- This message was sent by Atlassian JIRA (v6.4.14#64029)