[ 
https://issues.apache.org/jira/browse/KAFKA-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Parth Brahmbhatt reassigned KAFKA-2182:
---------------------------------------

    Assignee: Parth Brahmbhatt

> zkClient dies if there is any exception while reconnecting
> ----------------------------------------------------------
>
>                 Key: KAFKA-2182
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2182
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.1
>            Reporter: Igor Maravić
>            Assignee: Parth Brahmbhatt
>            Priority: Critical
>
> We, Spotify, have just been hit by a BUG that's related to ZkClient. It made 
> a whole Kafka cluster go down.
> Long story short, we've restarted TOR switch and all of our brokers from the 
> cluster lost all the connectivity with the zookeeper cluster, which was 
> living in another rack. Because of that, all the connections to Zookeeper got 
> expired.
> Everything would be fine if we haven't lost hostname to IP Address mapping 
> for some reason. Since we did lost that mapping, we got a 
> UnknownHostException when the broker tried to reconnect. This exception got 
> swallowed up, and we never got reconnected to Zookeeper, which in turn made 
> our cluster of brokers useless.
> If we had "handleSessionEstablishmentError" function, the whole exception 
> could be caught, we could just completely kill KafkaServer process and start 
> it cleanly, since this kind of exception is fatal for the KafkaCluster.
> {code}
> 2015-05-05T12:49:01.709+00:00 127.0.0.1 apache-kafka[main-EventThread] INFO  
> zookeeper.ZooKeeper  - Initiating client connection, 
> connectString=zookeeper1.spotify.net:2181,zookeeper2.spotify.net:2181,zookeeper3.spotify.net:2181/gabobroker-local
>  sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@7303d690
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 apache-kafka[main-EventThread] ERROR 
> zookeeper.ClientCnxn  - Error while calling watcher
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 java.lang.RuntimeException: Exception 
> while restarting zk client
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkClient.processStateChanged(ZkClient.java:462)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkClient.process(ZkClient.java:368)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 Caused by: 
> org.I0Itec.zkclient.exception.ZkException: Unable to connect to 
> zookeeper1.spotify.net:2181,zookeeper2.spotify.net:2181,zookeeper3.spotify.net:2181/gabobroker-local
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:66)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkClient.reconnect(ZkClient.java:939)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkClient.processStateChanged(ZkClient.java:458)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 ... 3 more
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 Caused by: 
> java.net.UnknownHostException: zookeeper1.spotify.net: Name or service not 
> known
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at 
> java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at 
> java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at 
> java.net.InetAddress.getAllByName0(InetAddress.java:1246)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at 
> java.net.InetAddress.getAllByName(InetAddress.java:1162)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at 
> java.net.InetAddress.getAllByName(InetAddress.java:1098)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at 
> org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at 
> org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at 
> org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:64)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 ... 5 more
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 
> apache-kafka[ZkClient-EventThread-18-zookeeper1.spotify.net:2181,zookeeper2.spotify.net:2181,zookeeper3.spotify.net:2181/gabobroker-local]
>  ERROR zkclient.ZkEventThread  - Error handling event ZkEvent[Children of 
> /config/changes changed sent to 
> kafka.server.TopicConfigManager$ConfigChangeListener$@17638f6]
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 java.lang.NullPointerException
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkClient$3.call(ZkClient.java:439)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkClient$3.call(ZkClient.java:436)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkClient.exists(ZkClient.java:436)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkClient.exists(ZkClient.java:445)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:566)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 apache-kafka[main-EventThread] INFO  
> zookeeper.ClientCnxn  - EventThread shut down
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 
> apache-kafka[ZkClient-EventThread-18-zookeeper1.spotify.net:2181,zookeeper2.spotify.net:2181,zookeeper3.spotify.net:2181/gabobroker-local]
>  ERROR zkclient.ZkEventThread  - Error handling event ZkEvent[Data of 
> /controller changed sent to 
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener@18360394]
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 java.lang.NullPointerException
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkClient$3.call(ZkClient.java:439)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkClient$3.call(ZkClient.java:436)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkClient.exists(ZkClient.java:436)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:544)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at 
> org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to