[ 
https://issues.apache.org/jira/browse/KAFKA-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin P. McCabe resolved KAFKA-7974.
------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.2.0

> KafkaAdminClient loses worker thread/enters zombie state when initial DNS 
> lookup fails
> --------------------------------------------------------------------------------------
>
>                 Key: KAFKA-7974
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7974
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Nicholas Parker
>            Priority: Major
>             Fix For: 2.2.0
>
>
> Version: kafka-clients-2.1.0
> I have some code that creates creates a KafkaAdminClient instance and then 
> invokes listTopics(). I was seeing the following stacktrace in the logs, 
> after which the KafkaAdminClient instance became unresponsive:
> {code:java}
> ERROR [kafka-admin-client-thread | adminclient-1] 2019-02-18 01:00:45,597 
> KafkaThread.java:51 - Uncaught exception in thread 'kafka-admin-client-thread 
> | adminclient-1':
> java.lang.IllegalStateException: No entry found for connection 0
>     at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:330)
>     at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:134)
>     at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:921)
>     at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:287)
>     at 
> org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.sendEligibleCalls(KafkaAdminClient.java:898)
>     at 
> org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1113)
>     at java.lang.Thread.run(Thread.java:748){code}
> From looking at the code I was able to trace down a possible cause:
>  * NetworkClient.ready() invokes this.initiateConnect() as seen in the above 
> stacktrace
>  * NetworkClient.initiateConnect() invokes 
> ClusterConnectionStates.connecting(), which internally invokes 
> ClientUtils.resolve() to to resolve the host when creating an entry for the 
> connection.
>  * If this host lookup fails, a UnknownHostException can be thrown back to 
> NetworkClient.initiateConnect() and the connection entry is not created in 
> ClusterConnectionStates. This exception doesn't get logged so this is a guess 
> on my part.
>  * NetworkClient.initiateConnect() catches the exception and attempts to call 
> ClusterConnectionStates.disconnected(), which throws an IllegalStateException 
> because no entry had yet been created due to the lookup failure.
>  * This IllegalStateException ends up killing the worker thread and 
> KafkaAdminClient gets stuck, never returning from listTopics().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to