[jira] [Commented] (KAFKA-10726) How to detect heartbeat failure between broker/zookeeper leader

2020-11-16 Thread Keiichiro Wakasa (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233304#comment-17233304
 ] 

Keiichiro Wakasa commented on KAFKA-10726:
--

[~Jack-Lee] 
Hello Jack, thank you so much for your comment and so sorry for the confusion.

The issue of heartbeat timeout has already been solved. (it's actually due to 
heavy logrotate on zk nodes.)

*So we are just looking for the way to detect the timeout issue for the future 
occurance*

> How to detect heartbeat failure between broker/zookeeper leader
> ---
>
> Key: KAFKA-10726
> URL: https://issues.apache.org/jira/browse/KAFKA-10726
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, logging
>Affects Versions: 2.1.1
>Reporter: Keiichiro Wakasa
>Priority: Critical
>
> Hello experts,
> I'm not sure this is proper place to ask but I'd appreciate if you could help 
> us with the following question...
>  
> We've continuously suffered from broker exclusion caused by heartbeat timeout 
> between broker and zookeeper leader.
> This issue can be easily detected by checking ephemeral nodes via zkcli.sh 
> but we'd like to detect this with logs like server.log/controller.log since 
> we have an existing system to forward these logs to our system. 
> Looking at server.log/controller.log, we couldn't find any logs that 
> indicates the heartbeat timeout. Is there any other logs to check for 
> heartbeat health?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10726) How to detect heartbeat failure between broker/zookeeper leader

2020-11-16 Thread lqjacklee (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233300#comment-17233300
 ] 

lqjacklee commented on KAFKA-10726:
---

If you are seeing excessive pauses during garbage collection, you can consider 
upgrading your JDK version or garbage collector (or extend your timeout value 
for zookeeper.session.timeout.ms). Additionally, you can tune your Java runtime 
to minimize garbage collection. The engineers at LinkedIn have written about 
optimizing JVM garbage collection in depth. Of course, you can also check the 
Kafka documentation for some recommendations.


some metrics which provide more information can help you :



||Name|| Description || Metric type|| Availability||
|outstanding_requests |Number of requests queued| Resource: Saturation | 
Four-letter words, AdminServer, JMX|
|avg_latency|Amount of time it takes to respond to a client request (in 
ms)|Work: Throughput|Four-letter words, AdminServer, JMX|
|num_alive_connections|Number of clients connected to ZooKeeper|Resource: 
Availability|Four-letter words, AdminServer, JMX|
|followers|Number of active followers|Resource: Availability|Four-letter words, 
AdminServer
|pending_syncs|Number of pending syncs from followers|Other|Four-letter words, 
AdminServer, JMX|
|open_file_descriptor_count|Number of file descriptors in use|Resource: 
Utilization|Four-letter words, AdminServer|




> How to detect heartbeat failure between broker/zookeeper leader
> ---
>
> Key: KAFKA-10726
> URL: https://issues.apache.org/jira/browse/KAFKA-10726
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, logging
>Affects Versions: 2.1.1
>Reporter: Keiichiro Wakasa
>Priority: Critical
>
> Hello experts,
> I'm not sure this is proper place to ask but I'd appreciate if you could help 
> us with the following question...
>  
> We've continuously suffered from broker exclusion caused by heartbeat timeout 
> between broker and zookeeper leader.
> This issue can be easily detected by checking ephemeral nodes via zkcli.sh 
> but we'd like to detect this with logs like server.log/controller.log since 
> we have an existing system to forward these logs to our system. 
> Looking at server.log/controller.log, we couldn't find any logs that 
> indicates the heartbeat timeout. Is there any other logs to check for 
> heartbeat health?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)