Hi, My requirement is that i want to know which node of cluster is alive, something like health check, and we will create some topic too with the admin client java api .
In health check we use adminClient.describeCluster() to list alive node, at the beginning that works, but after sometimes (allways default timeout milliseconds) , it get “no node found” exception, but actually the kafka server is alive. It seems cause by the previous call is timeout and then the kafka client disconnect, but actually that call have get the response and successful. Is someone can help? Below is the detail logs: 2017-07-19 11:49:50.397 main org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: queueing Call(callName=createTopics, deadlineMs=1500436310397) with a timeout 120000 ms from now. 2017-07-19 11:49:50.398 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: KafkaClient#poll retrieved 0 response(s) 2017-07-19 11:49:50.398 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: assigned Call(callName=createTopics, deadlineMs=1500436310397) to 192.168.0.3:9092 (id: 0 rack: null) 2017-07-19 11:49:50.398 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: sending (type=CreateTopicsRequest, topics={_INTERNAL_SUBSCRIPTION_568cfbce-dda4-4c36-822c-09c446c49b08=(numPartitions=1, replicationFactor=1, replicasAssignments={}, configs={})}, timeout=119999, validateOnly=false) to 192.168.0.3:9092 (id: 0 rack: null). correlationId=4 2017-07-19 11:49:50.398 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: entering KafkaClient#poll(timeout=111824) 2017-07-19 11:49:50.398 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: KafkaClient#poll retrieved 0 response(s) 2017-07-19 11:49:50.398 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: entering KafkaClient#poll(timeout=111824) 2017-07-19 11:49:50.412 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: KafkaClient#poll retrieved 1 response(s) 2017-07-19 11:49:50.413 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: Call(callName=createTopics, deadlineMs=1500436310397) got response org.apache.kafka.common.requests.CreateTopicsResponse@c6f92a 2017-07-19 11:49:50.413 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: entering KafkaClient#poll(timeout=111810) 2017-07-19 11:51:32.908 monitor-schedulers1 org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: queueing Call(callName=listNodes, deadlineMs=1500436412908) with a timeout 120000 ms from now. 2017-07-19 11:51:32.908 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: KafkaClient#poll retrieved 0 response(s) 2017-07-19 11:51:32.908 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: assigned Call(callName=listNodes, deadlineMs=1500436412908) to 192.168.0.3:9092 (id: 0 rack: null) 2017-07-19 11:51:32.908 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: sending (type=MetadataRequest, topics=) to 192.168.0.3:9092 (id: 0 rack: null). correlationId=12 2017-07-19 11:51:32.909 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: entering KafkaClient#poll(timeout=9314) 2017-07-19 11:51:32.909 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: KafkaClient#poll retrieved 0 response(s) 2017-07-19 11:51:32.909 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: entering KafkaClient#poll(timeout=9313) 2017-07-19 11:51:32.909 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: KafkaClient#poll retrieved 1 response(s) 2017-07-19 11:51:32.910 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: Call(callName=listNodes, deadlineMs=1500436412908) got response org.apache.kafka.common.requests.MetadataResponse@40de203e 2017-07-19 11:51:32.910 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: entering KafkaClient#poll(timeout=9313) 2017-07-19 11:51:32.910 monitor-schedulers1 com.monitor.health.HealthCheck {} Kafka status: UP {Alive Nodes=[192.168.0.3]} 2017-07-19 11:51:42.223 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: KafkaClient#poll retrieved 0 response(s) 2017-07-19 11:51:42.223 kafka-admin-client-thread GenericKafkaAdminClient DEBUG DispatcherConnector ECNSHWDP2059 org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: Closing connection to 0 to time out Call(callName=createTopics, deadlineMs=1500436302222) 2017-07-19 11:51:42.223 kafka-admin-client-thread GenericKafkaAdminClient DEBUG DispatcherConnector ECNSHWDP2059 org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: timed out 1 call(s) in flight. 2017-07-19 11:51:42.223 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: entering KafkaClient#poll(timeout=1200000) 2017-07-19 11:52:02.916 monitor-schedulers1 org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: queueing Call(callName=listNodes, deadlineMs=1500436442916) with a timeout 120000 ms from now. 2017-07-19 11:52:02.916 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: KafkaClient#poll retrieved 0 response(s) 2017-07-19 11:52:02.916 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: Closing connection to 0 to time out Call(callName=createTopics, deadlineMs=1500436302222) 2017-07-19 11:52:02.916 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} GenericKafkaAdminClient: timed out 1 call(s) in flight. 2017-07-19 11:52:02.924 kafka-admin-client-thread org.apache.kafka.clients.admin.KafkaAdminClient {} Call(callName=listNodes, deadlineMs=1500436442916) failed with non-retriable exception after 1 attempt(s) java.lang.Exception: BrokerNotAvailableException: Error choosing node for listNodes: no node found. at org.apache.kafka.clients.admin.KafkaAdminClient$Call.fail(KafkaAdminClient.java:484) [kafka-clients-0.11.0.0.jar:na] at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.chooseNodeForNewCall(KafkaAdminClient.java:720) [kafka-clients-0.11.0.0.jar:na] at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.chooseNodesForNewCalls(KafkaAdminClient.java:706) [kafka-clients-0.11.0.0.jar:na] at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:920) [kafka-clients-0.11.0.0.jar:na] at java.lang.Thread.run(Unknown Source) [na:1.8.0_102]