[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics

2018-05-19 Thread Dong Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481810#comment-16481810
 ] 

Dong Lin commented on KAFKA-3473:
-

Sure. Opened [https://github.com/apache/kafka/pull/5043] to add docs. Thanks 
for the reference.

> KIP-237: More Controller Health Metrics
> ---
>
> Key: KAFKA-3473
> URL: https://issues.apache.org/jira/browse/KAFKA-3473
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 1.0.1
>Reporter: Jiangjie Qin
>Assignee: Dong Lin
>Priority: Major
> Fix For: 2.0.0
>
>
> Currently controller appends the requests to brokers into controller channel 
> manager queue during state transition. i.e. the state transition are 
> propagated asynchronously. We need to track the request queue time on the 
> controller side to see how long the state propagation is delayed after the 
> state transition finished on the controller.
> We also want to have metrics to monitor the ControllerEventManager queue size 
> and the average time it takes for a event to wait in this queue before being 
> processed.
> See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics
>  for more detail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics

2018-05-19 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481754#comment-16481754
 ] 

James Cheng commented on KAFKA-3473:


I was thinking of the [http://kafka.apache.org/documentation/#monitoring] 
section of the kafka site. It has a list of the JMX metrics that are available 
on the brokers. We can add the new metrics to that list.

I think the file to edit is 
https://github.com/apache/kafka/blob/trunk/docs/ops.html

> KIP-237: More Controller Health Metrics
> ---
>
> Key: KAFKA-3473
> URL: https://issues.apache.org/jira/browse/KAFKA-3473
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 1.0.1
>Reporter: Jiangjie Qin
>Assignee: Dong Lin
>Priority: Major
> Fix For: 2.0.0
>
>
> Currently controller appends the requests to brokers into controller channel 
> manager queue during state transition. i.e. the state transition are 
> propagated asynchronously. We need to track the request queue time on the 
> controller side to see how long the state propagation is delayed after the 
> state transition finished on the controller.
> We also want to have metrics to monitor the ControllerEventManager queue size 
> and the average time it takes for a event to wait in this queue before being 
> processed.
> See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics
>  for more detail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics

2018-05-19 Thread Dong Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481742#comment-16481742
 ] 

Dong Lin commented on KAFKA-3473:
-

[~wushujames] Could you tell me which docs you are referring to?

> KIP-237: More Controller Health Metrics
> ---
>
> Key: KAFKA-3473
> URL: https://issues.apache.org/jira/browse/KAFKA-3473
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 1.0.1
>Reporter: Jiangjie Qin
>Assignee: Dong Lin
>Priority: Major
> Fix For: 2.0.0
>
>
> Currently controller appends the requests to brokers into controller channel 
> manager queue during state transition. i.e. the state transition are 
> propagated asynchronously. We need to track the request queue time on the 
> controller side to see how long the state propagation is delayed after the 
> state transition finished on the controller.
> We also want to have metrics to monitor the ControllerEventManager queue size 
> and the average time it takes for a event to wait in this queue before being 
> processed.
> See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics
>  for more detail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics

2018-05-19 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481489#comment-16481489
 ] 

James Cheng commented on KAFKA-3473:


[~lindong], can you update the docs to include this new metric?

> KIP-237: More Controller Health Metrics
> ---
>
> Key: KAFKA-3473
> URL: https://issues.apache.org/jira/browse/KAFKA-3473
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 1.0.1
>Reporter: Jiangjie Qin
>Assignee: Dong Lin
>Priority: Major
> Fix For: 2.0.0
>
>
> Currently controller appends the requests to brokers into controller channel 
> manager queue during state transition. i.e. the state transition are 
> propagated asynchronously. We need to track the request queue time on the 
> controller side to see how long the state propagation is delayed after the 
> state transition finished on the controller.
> We also want to have metrics to monitor the ControllerEventManager queue size 
> and the average time it takes for a event to wait in this queue before being 
> processed.
> See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics
>  for more detail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics

2018-05-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475316#comment-16475316
 ] 

ASF GitHub Bot commented on KAFKA-3473:
---

lindong28 closed pull request #4392: KAFKA-3473; More Controller Health Metrics 
(KIP-237)
URL: https://github.com/apache/kafka/pull/4392
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/core/src/main/scala/kafka/controller/ControllerChannelManager.scala 
b/core/src/main/scala/kafka/controller/ControllerChannelManager.scala
index aab4de23606..addd88df3f0 100755
--- a/core/src/main/scala/kafka/controller/ControllerChannelManager.scala
+++ b/core/src/main/scala/kafka/controller/ControllerChannelManager.scala
@@ -19,7 +19,7 @@ package kafka.controller
 import java.net.SocketTimeoutException
 import java.util.concurrent.{BlockingQueue, LinkedBlockingQueue, TimeUnit}
 
-import com.yammer.metrics.core.Gauge
+import com.yammer.metrics.core.{Gauge, Timer}
 import kafka.api._
 import kafka.cluster.Broker
 import kafka.common.KafkaException
@@ -44,6 +44,7 @@ import scala.collection.{Set, mutable}
 
 object ControllerChannelManager {
   val QueueSizeMetricName = "QueueSize"
+  val RequestRateAndQueueTimeMetricName = "RequestRateAndQueueTimeMs"
 }
 
 class ControllerChannelManager(controllerContext: ControllerContext, config: 
KafkaConfig, time: Time, metrics: Metrics,
@@ -82,7 +83,7 @@ class ControllerChannelManager(controllerContext: 
ControllerContext, config: Kaf
   val stateInfoOpt = brokerStateInfo.get(brokerId)
   stateInfoOpt match {
 case Some(stateInfo) =>
-  stateInfo.messageQueue.put(QueueItem(apiKey, request, callback))
+  stateInfo.messageQueue.put(QueueItem(apiKey, request, callback, 
time.milliseconds()))
 case None =>
   warn(s"Not sending request $request to broker $brokerId, since it is 
offline.")
   }
@@ -151,8 +152,12 @@ class ControllerChannelManager(controllerContext: 
ControllerContext, config: Kaf
   case Some(name) => 
s"$name:Controller-${config.brokerId}-to-broker-${broker.id}-send-thread"
 }
 
+val requestRateAndQueueTimeMetrics = newTimer(
+  RequestRateAndQueueTimeMetricName, TimeUnit.MILLISECONDS, 
TimeUnit.SECONDS, brokerMetricTags(broker.id)
+)
+
 val requestThread = new RequestSendThread(config.brokerId, 
controllerContext, messageQueue, networkClient,
-  brokerNode, config, time, stateChangeLogger, threadName)
+  brokerNode, config, time, requestRateAndQueueTimeMetrics, 
stateChangeLogger, threadName)
 requestThread.setDaemon(false)
 
 val queueSizeGauge = newGauge(
@@ -160,14 +165,14 @@ class ControllerChannelManager(controllerContext: 
ControllerContext, config: Kaf
   new Gauge[Int] {
 def value: Int = messageQueue.size
   },
-  queueSizeTags(broker.id)
+  brokerMetricTags(broker.id)
 )
 
-brokerStateInfo.put(broker.id, new 
ControllerBrokerStateInfo(networkClient, brokerNode, messageQueue,
-  requestThread, queueSizeGauge))
+brokerStateInfo.put(broker.id, ControllerBrokerStateInfo(networkClient, 
brokerNode, messageQueue,
+  requestThread, queueSizeGauge, requestRateAndQueueTimeMetrics))
   }
 
-  private def queueSizeTags(brokerId: Int) = Map("broker-id" -> 
brokerId.toString)
+  private def brokerMetricTags(brokerId: Int) = Map("broker-id" -> 
brokerId.toString)
 
   private def removeExistingBroker(brokerState: ControllerBrokerStateInfo) {
 try {
@@ -178,7 +183,8 @@ class ControllerChannelManager(controllerContext: 
ControllerContext, config: Kaf
   brokerState.requestSendThread.shutdown()
   brokerState.networkClient.close()
   brokerState.messageQueue.clear()
-  removeMetric(QueueSizeMetricName, 
queueSizeTags(brokerState.brokerNode.id))
+  removeMetric(QueueSizeMetricName, 
brokerMetricTags(brokerState.brokerNode.id))
+  removeMetric(RequestRateAndQueueTimeMetricName, 
brokerMetricTags(brokerState.brokerNode.id))
   brokerStateInfo.remove(brokerState.brokerNode.id)
 } catch {
   case e: Throwable => error("Error while removing broker by the 
controller", e)
@@ -193,7 +199,7 @@ class ControllerChannelManager(controllerContext: 
ControllerContext, config: Kaf
 }
 
 case class QueueItem(apiKey: ApiKeys, request: AbstractRequest.Builder[_ <: 
AbstractRequest],
- callback: AbstractResponse => Unit)
+ callback: AbstractResponse => Unit, enqueueTimeMs: Long)
 
 class RequestSendThread(val controllerId: Int,
 val controllerContext: ControllerContext,
@@ -202,6 +208,7 @@ class RequestSendThread(val controllerId: Int,
 val brokerNode: Node,
  

[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics

2018-02-01 Thread Damian Guy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348441#comment-16348441
 ] 

Damian Guy commented on KAFKA-3473:
---

[~lindong] [~ijuma] what is the status of the PR? Should this be in 1.1?

> KIP-237: More Controller Health Metrics
> ---
>
> Key: KAFKA-3473
> URL: https://issues.apache.org/jira/browse/KAFKA-3473
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 1.0.1
>Reporter: Jiangjie Qin
>Assignee: Dong Lin
>Priority: Major
> Fix For: 1.1.0
>
>
> Currently controller appends the requests to brokers into controller channel 
> manager queue during state transition. i.e. the state transition are 
> propagated asynchronously. We need to track the request queue time on the 
> controller side to see how long the state propagation is delayed after the 
> state transition finished on the controller.
> We also want to have metrics to monitor the ControllerEventManager queue size 
> and the average time it takes for a event to wait in this queue before being 
> processed.
> See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics
>  for more detail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)