Raoufeh Hashemian created KAFKA-5780:
----------------------------------------

             Summary: Long shutdown time when updated to 0.11.0
                 Key: KAFKA-5780
                 URL: https://issues.apache.org/jira/browse/KAFKA-5780
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 0.11.0.0
         Environment: CentOS Linux release 7.3.1611 , Kernel 3.10
            Reporter: Raoufeh Hashemian
         Attachments: broker_shutdown.png

When we switched from Kafka 0.10.2 to Kafka 0.11.0 , We faced a problem with 
stopping the kafka service on a broker node.

Our cluster consists of 6 broker nodes. We had an existing topic when switched 
to Kafka 0.11.0 . Since then, gracefully stoping the service on a Kafka broker 
node results in the following warning message being repeated every 100 ms in 
the broker log, and the shutdown takes approximately 45 minutes to complete.

{code:java}
@40000000599714da1e582e4c [2017-08-18 16:24:48,509] WARN Connection to node 
1002 could not be established. Broker may not be available. 
(org.apache.kafka.clients.NetworkClient)
@40000000599714da245483a4 [2017-08-18 16:24:48,609] WARN Connection to node 
1002 could not be established. Broker may not be available. 
(org.apache.kafka.clients.NetworkClient)
@40000000599714da2a51177c [2017-08-18 16:24:48,709] WARN Connection to node 
1002 could not be established. Broker may not be available. 
(org.apache.kafka.clients.NetworkClient)
{code}

Below is the last log lines when the shutdown is complete :

{code:java}
@4000000059971afd31113dbc [2017-08-18 16:50:59,823] WARN Connection to node 
1002 could not be established. Broker may not be available. 
(org.apache.kafka.clients.NetworkClient)
@4000000059971afd361200bc [2017-08-18 16:50:59,907] INFO Shutdown complete. 
(kafka.log.LogManager)
@4000000059971afd36afa04c [2017-08-18 16:50:59,917] INFO Terminate ZkClient 
event thread. (org.I0Itec.zkclient.ZkEventThread)
@4000000059971afd36dd6edc [2017-08-18 16:50:59,920] INFO Session: 
0x35d68c9e76702a4 closed (org.apache.zookeeper.ZooKeeper)
@4000000059971afd36deca84 [2017-08-18 16:50:59,920] INFO EventThread shut down 
for session: 0x35d68c9e76702a4 (org.apache.zookeeper.ClientCnxn)
@4000000059971afd36f6afb4 [2017-08-18 16:50:59,922] INFO [Kafka Server 1002], 
shut down completed (kafka.server.KafkaServer)
{code}

I should note that I stopped the producers before shutting down the broker.
If I repeat the process after brining up the service, the shutdown takes less 
than a minute. However, if I start the producers even for a short time and 
repeat the process, it will again take around 45 minutes to do a graceful 
shutdown.

Attached files shows the brokers CPU usage during the shutdown period (light 
blue curve is the node in which the broker service is shutting down).
The size of the topic is 2.3 TB per broker.

I was wondering if this is an expected new normal in Kafka 0.11.0 or It is a 
bug or a mis configuration?




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to