[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics

2018-05-19 Thread Dong Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481810#comment-16481810
 ] 

Dong Lin commented on KAFKA-3473:
-

Sure. Opened [https://github.com/apache/kafka/pull/5043] to add docs. Thanks 
for the reference.

> KIP-237: More Controller Health Metrics
> ---
>
> Key: KAFKA-3473
> URL: https://issues.apache.org/jira/browse/KAFKA-3473
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 1.0.1
>Reporter: Jiangjie Qin
>Assignee: Dong Lin
>Priority: Major
> Fix For: 2.0.0
>
>
> Currently controller appends the requests to brokers into controller channel 
> manager queue during state transition. i.e. the state transition are 
> propagated asynchronously. We need to track the request queue time on the 
> controller side to see how long the state propagation is delayed after the 
> state transition finished on the controller.
> We also want to have metrics to monitor the ControllerEventManager queue size 
> and the average time it takes for a event to wait in this queue before being 
> processed.
> See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics
>  for more detail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics

2018-05-19 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481754#comment-16481754
 ] 

James Cheng commented on KAFKA-3473:


I was thinking of the [http://kafka.apache.org/documentation/#monitoring] 
section of the kafka site. It has a list of the JMX metrics that are available 
on the brokers. We can add the new metrics to that list.

I think the file to edit is 
https://github.com/apache/kafka/blob/trunk/docs/ops.html

> KIP-237: More Controller Health Metrics
> ---
>
> Key: KAFKA-3473
> URL: https://issues.apache.org/jira/browse/KAFKA-3473
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 1.0.1
>Reporter: Jiangjie Qin
>Assignee: Dong Lin
>Priority: Major
> Fix For: 2.0.0
>
>
> Currently controller appends the requests to brokers into controller channel 
> manager queue during state transition. i.e. the state transition are 
> propagated asynchronously. We need to track the request queue time on the 
> controller side to see how long the state propagation is delayed after the 
> state transition finished on the controller.
> We also want to have metrics to monitor the ControllerEventManager queue size 
> and the average time it takes for a event to wait in this queue before being 
> processed.
> See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics
>  for more detail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics

2018-05-19 Thread Dong Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481742#comment-16481742
 ] 

Dong Lin commented on KAFKA-3473:
-

[~wushujames] Could you tell me which docs you are referring to?

> KIP-237: More Controller Health Metrics
> ---
>
> Key: KAFKA-3473
> URL: https://issues.apache.org/jira/browse/KAFKA-3473
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 1.0.1
>Reporter: Jiangjie Qin
>Assignee: Dong Lin
>Priority: Major
> Fix For: 2.0.0
>
>
> Currently controller appends the requests to brokers into controller channel 
> manager queue during state transition. i.e. the state transition are 
> propagated asynchronously. We need to track the request queue time on the 
> controller side to see how long the state propagation is delayed after the 
> state transition finished on the controller.
> We also want to have metrics to monitor the ControllerEventManager queue size 
> and the average time it takes for a event to wait in this queue before being 
> processed.
> See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics
>  for more detail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics

2018-05-18 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481489#comment-16481489
 ] 

James Cheng commented on KAFKA-3473:


[~lindong], can you update the docs to include this new metric?

> KIP-237: More Controller Health Metrics
> ---
>
> Key: KAFKA-3473
> URL: https://issues.apache.org/jira/browse/KAFKA-3473
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 1.0.1
>Reporter: Jiangjie Qin
>Assignee: Dong Lin
>Priority: Major
> Fix For: 2.0.0
>
>
> Currently controller appends the requests to brokers into controller channel 
> manager queue during state transition. i.e. the state transition are 
> propagated asynchronously. We need to track the request queue time on the 
> controller side to see how long the state propagation is delayed after the 
> state transition finished on the controller.
> We also want to have metrics to monitor the ControllerEventManager queue size 
> and the average time it takes for a event to wait in this queue before being 
> processed.
> See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics
>  for more detail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics

2018-05-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475316#comment-16475316
 ] 

ASF GitHub Bot commented on KAFKA-3473:
---

lindong28 closed pull request #4392: KAFKA-3473; More Controller Health Metrics 
(KIP-237)
URL: https://github.com/apache/kafka/pull/4392
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/core/src/main/scala/kafka/controller/ControllerChannelManager.scala 
b/core/src/main/scala/kafka/controller/ControllerChannelManager.scala
index aab4de23606..addd88df3f0 100755
--- a/core/src/main/scala/kafka/controller/ControllerChannelManager.scala
+++ b/core/src/main/scala/kafka/controller/ControllerChannelManager.scala
@@ -19,7 +19,7 @@ package kafka.controller
 import java.net.SocketTimeoutException
 import java.util.concurrent.{BlockingQueue, LinkedBlockingQueue, TimeUnit}
 
-import com.yammer.metrics.core.Gauge
+import com.yammer.metrics.core.{Gauge, Timer}
 import kafka.api._
 import kafka.cluster.Broker
 import kafka.common.KafkaException
@@ -44,6 +44,7 @@ import scala.collection.{Set, mutable}
 
 object ControllerChannelManager {
   val QueueSizeMetricName = "QueueSize"
+  val RequestRateAndQueueTimeMetricName = "RequestRateAndQueueTimeMs"
 }
 
 class ControllerChannelManager(controllerContext: ControllerContext, config: 
KafkaConfig, time: Time, metrics: Metrics,
@@ -82,7 +83,7 @@ class ControllerChannelManager(controllerContext: 
ControllerContext, config: Kaf
   val stateInfoOpt = brokerStateInfo.get(brokerId)
   stateInfoOpt match {
 case Some(stateInfo) =>
-  stateInfo.messageQueue.put(QueueItem(apiKey, request, callback))
+  stateInfo.messageQueue.put(QueueItem(apiKey, request, callback, 
time.milliseconds()))
 case None =>
   warn(s"Not sending request $request to broker $brokerId, since it is 
offline.")
   }
@@ -151,8 +152,12 @@ class ControllerChannelManager(controllerContext: 
ControllerContext, config: Kaf
   case Some(name) => 
s"$name:Controller-${config.brokerId}-to-broker-${broker.id}-send-thread"
 }
 
+val requestRateAndQueueTimeMetrics = newTimer(
+  RequestRateAndQueueTimeMetricName, TimeUnit.MILLISECONDS, 
TimeUnit.SECONDS, brokerMetricTags(broker.id)
+)
+
 val requestThread = new RequestSendThread(config.brokerId, 
controllerContext, messageQueue, networkClient,
-  brokerNode, config, time, stateChangeLogger, threadName)
+  brokerNode, config, time, requestRateAndQueueTimeMetrics, 
stateChangeLogger, threadName)
 requestThread.setDaemon(false)
 
 val queueSizeGauge = newGauge(
@@ -160,14 +165,14 @@ class ControllerChannelManager(controllerContext: 
ControllerContext, config: Kaf
   new Gauge[Int] {
 def value: Int = messageQueue.size
   },
-  queueSizeTags(broker.id)
+  brokerMetricTags(broker.id)
 )
 
-brokerStateInfo.put(broker.id, new 
ControllerBrokerStateInfo(networkClient, brokerNode, messageQueue,
-  requestThread, queueSizeGauge))
+brokerStateInfo.put(broker.id, ControllerBrokerStateInfo(networkClient, 
brokerNode, messageQueue,
+  requestThread, queueSizeGauge, requestRateAndQueueTimeMetrics))
   }
 
-  private def queueSizeTags(brokerId: Int) = Map("broker-id" -> 
brokerId.toString)
+  private def brokerMetricTags(brokerId: Int) = Map("broker-id" -> 
brokerId.toString)
 
   private def removeExistingBroker(brokerState: ControllerBrokerStateInfo) {
 try {
@@ -178,7 +183,8 @@ class ControllerChannelManager(controllerContext: 
ControllerContext, config: Kaf
   brokerState.requestSendThread.shutdown()
   brokerState.networkClient.close()
   brokerState.messageQueue.clear()
-  removeMetric(QueueSizeMetricName, 
queueSizeTags(brokerState.brokerNode.id))
+  removeMetric(QueueSizeMetricName, 
brokerMetricTags(brokerState.brokerNode.id))
+  removeMetric(RequestRateAndQueueTimeMetricName, 
brokerMetricTags(brokerState.brokerNode.id))
   brokerStateInfo.remove(brokerState.brokerNode.id)
 } catch {
   case e: Throwable => error("Error while removing broker by the 
controller", e)
@@ -193,7 +199,7 @@ class ControllerChannelManager(controllerContext: 
ControllerContext, config: Kaf
 }
 
 case class QueueItem(apiKey: ApiKeys, request: AbstractRequest.Builder[_ <: 
AbstractRequest],
- callback: AbstractResponse => Unit)
+ callback: AbstractResponse => Unit, enqueueTimeMs: Long)
 
 class RequestSendThread(val controllerId: Int,
 val controllerContext: ControllerContext,
@@ -202,6 +208,7 @@ class RequestSendThread(val controllerId: Int,
 val brokerNode:

[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics

2018-02-01 Thread Damian Guy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348441#comment-16348441
 ] 

Damian Guy commented on KAFKA-3473:
---

[~lindong] [~ijuma] what is the status of the PR? Should this be in 1.1?

> KIP-237: More Controller Health Metrics
> ---
>
> Key: KAFKA-3473
> URL: https://issues.apache.org/jira/browse/KAFKA-3473
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 1.0.1
>Reporter: Jiangjie Qin
>Assignee: Dong Lin
>Priority: Major
> Fix For: 1.1.0
>
>
> Currently controller appends the requests to brokers into controller channel 
> manager queue during state transition. i.e. the state transition are 
> propagated asynchronously. We need to track the request queue time on the 
> controller side to see how long the state propagation is delayed after the 
> state transition finished on the controller.
> We also want to have metrics to monitor the ControllerEventManager queue size 
> and the average time it takes for a event to wait in this queue before being 
> processed.
> See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics
>  for more detail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-3473) KIP-237: More Controller Health Metrics

2018-01-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311911#comment-16311911
 ] 

ASF GitHub Bot commented on KAFKA-3473:
---

lindong28 opened a new pull request #4392: KAFKA-3473; More Controller Health 
Metrics (KIP-237)
URL: https://github.com/apache/kafka/pull/4392
 
 
   This patch adds a few metrics that are useful for monitoring controller 
health. See KIP-237 for more detail.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> KIP-237: More Controller Health Metrics
> ---
>
> Key: KAFKA-3473
> URL: https://issues.apache.org/jira/browse/KAFKA-3473
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 1.0.1
>Reporter: Jiangjie Qin
>Assignee: Dong Lin
> Fix For: 1.1.0
>
>
> Currently controller appends the requests to brokers into controller channel 
> manager queue during state transition. i.e. the state transition are 
> propagated asynchronously. We need to track the request queue time on the 
> controller side to see how long the state propagation is delayed after the 
> state transition finished on the controller.
> We also want to have metrics to monitor the ControllerEventManager queue size 
> and the average time it takes for a event to wait in this queue before being 
> processed.
> See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics
>  for more detail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)