[jira] [Commented] (KAFKA-6128) Shutdown script does not do a clean shutdown
[ https://issues.apache.org/jira/browse/KAFKA-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220385#comment-16220385 ] Alastair Munro commented on KAFKA-6128: --- We seem to have a broken zookeeper. If I test on another setup, we are good. So in summary kafka is connecting to a load balanced zookeeper:2181 cluster, and zookeeper.connect uses this. When a node is stopped, the id /brokers/ids/ is not removed from some of the zookeeper nodes. On restart the broker connects to one of the zookeeper nodes where /brokers/ids/ has not been updated and reports the issue of not being shutdown properly. > Shutdown script does not do a clean shutdown > > > Key: KAFKA-6128 > URL: https://issues.apache.org/jira/browse/KAFKA-6128 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.1 >Reporter: Alastair Munro >Priority: Minor > > Shutdown script (sending term signal) does not do a clean shutdown. > We are running kafka in kubernetes/openshift 0.11.0.0. The statefulset kafka > runs the shutdown script prior to stopping the pod kafka is running on: > {code} > lifecycle: > preStop: > exec: > command: > - ./bin/kafka-server-stop.sh > {code} > This worked perfectly in 0.11.0.0 but doesn't in 0.11.0.1. Also we see the > same behaviour if we send a TERM signal to the kafka process (same as the > shutdown script). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6128) Shutdown script does not do a clean shutdown
[ https://issues.apache.org/jira/browse/KAFKA-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220293#comment-16220293 ] Alastair Munro commented on KAFKA-6128: --- It does it with 0.11.0.0. It would seem the broker id's are not being replicated in zookeeper. So kafka connects to zookeeper:2181 which is a load balancer. Then when it terminates, it removes the broker-id on the zookeeper it connected to, but the change is not replicated to the other two nodes. As zookeeper may scale up and down, we use zookeeper:2181 rather than a list of hosts. Testing replication in zookeeper (eg creating /test using zkCli.sh), replication works fine. So why doesn't the broker changes replicate? I scaled this down to have only brokers 0 and 1, but some of the zoo nodes don't see the change: {code} oc rsh zoo-0 bin/zkCli.sh ls /brokers/ids [0, 1] oc rsh zoo-1 bin/zkCli.sh ls /brokers/ids [0, 1, 2] {code} > Shutdown script does not do a clean shutdown > > > Key: KAFKA-6128 > URL: https://issues.apache.org/jira/browse/KAFKA-6128 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.1 >Reporter: Alastair Munro >Priority: Minor > > Shutdown script (sending term signal) does not do a clean shutdown. > We are running kafka in kubernetes/openshift 0.11.0.0. The statefulset kafka > runs the shutdown script prior to stopping the pod kafka is running on: > {code} > lifecycle: > preStop: > exec: > command: > - ./bin/kafka-server-stop.sh > {code} > This worked perfectly in 0.11.0.0 but doesn't in 0.11.0.1. Also we see the > same behaviour if we send a TERM signal to the kafka process (same as the > shutdown script). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6128) Shutdown script does not do a clean shutdown
[ https://issues.apache.org/jira/browse/KAFKA-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220248#comment-16220248 ] Alastair Munro commented on KAFKA-6128: --- Kafka says it was not shutdown cleanly and shuts down again. > Shutdown script does not do a clean shutdown > > > Key: KAFKA-6128 > URL: https://issues.apache.org/jira/browse/KAFKA-6128 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.1 >Reporter: Alastair Munro >Priority: Minor > > Shutdown script (sending term signal) does not do a clean shutdown. > We are running kafka in kubernetes/openshift 0.11.0.0. The statefulset kafka > runs the shutdown script prior to stopping the pod kafka is running on: > {code} > lifecycle: > preStop: > exec: > command: > - ./bin/kafka-server-stop.sh > {code} > This worked perfectly in 0.11.0.0 but doesn't in 0.11.0.1. Also we see the > same behaviour if we send a TERM signal to the kafka process (same as the > shutdown script). -- This message was sent by Atlassian JIRA (v6.4.14#64029)