Raoufeh Hashemian created KAFKA-5780: ----------------------------------------
Summary: Long shutdown time when updated to 0.11.0 Key: KAFKA-5780 URL: https://issues.apache.org/jira/browse/KAFKA-5780 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.11.0.0 Environment: CentOS Linux release 7.3.1611 , Kernel 3.10 Reporter: Raoufeh Hashemian Attachments: broker_shutdown.png When we switched from Kafka 0.10.2 to Kafka 0.11.0 , We faced a problem with stopping the kafka service on a broker node. Our cluster consists of 6 broker nodes. We had an existing topic when switched to Kafka 0.11.0 . Since then, gracefully stoping the service on a Kafka broker node results in the following warning message being repeated every 100 ms in the broker log, and the shutdown takes approximately 45 minutes to complete. {code:java} @40000000599714da1e582e4c [2017-08-18 16:24:48,509] WARN Connection to node 1002 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) @40000000599714da245483a4 [2017-08-18 16:24:48,609] WARN Connection to node 1002 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) @40000000599714da2a51177c [2017-08-18 16:24:48,709] WARN Connection to node 1002 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) {code} Below is the last log lines when the shutdown is complete : {code:java} @4000000059971afd31113dbc [2017-08-18 16:50:59,823] WARN Connection to node 1002 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) @4000000059971afd361200bc [2017-08-18 16:50:59,907] INFO Shutdown complete. (kafka.log.LogManager) @4000000059971afd36afa04c [2017-08-18 16:50:59,917] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread) @4000000059971afd36dd6edc [2017-08-18 16:50:59,920] INFO Session: 0x35d68c9e76702a4 closed (org.apache.zookeeper.ZooKeeper) @4000000059971afd36deca84 [2017-08-18 16:50:59,920] INFO EventThread shut down for session: 0x35d68c9e76702a4 (org.apache.zookeeper.ClientCnxn) @4000000059971afd36f6afb4 [2017-08-18 16:50:59,922] INFO [Kafka Server 1002], shut down completed (kafka.server.KafkaServer) {code} I should note that I stopped the producers before shutting down the broker. If I repeat the process after brining up the service, the shutdown takes less than a minute. However, if I start the producers even for a short time and repeat the process, it will again take around 45 minutes to do a graceful shutdown. Attached files shows the brokers CPU usage during the shutdown period (light blue curve is the node in which the broker service is shutting down). The size of the topic is 2.3 TB per broker. I was wondering if this is an expected new normal in Kafka 0.11.0 or It is a bug or a mis configuration? -- This message was sent by Atlassian JIRA (v6.4.14#64029)