On Tuesday 10 May 2016 09:29 PM, Radoslaw Gruchalski wrote:
Kafka is expecting the state to be there when the zookeeper comes back. One way 
to protect yourself from what you see happening, is to have a zookeeper quorum. 
Run a cluster of 3 zookeepers, then repeat your exercise.

Kafka will continue to work absolutely fine. Just remember, with 3 ZK 
instances, you can only kill one at a time.

I haven't run this kind of Zookeeper deployment before, so just curious - did you really mean that only one instance of Zookeeper can be stopped at a time when 3 of them were forming the cluster? Or would it still work if at least one instance was up and other 2 crashed/stopped either at the same time or one at a time.


-Jaikiran


–
Best regards,

Radek Gruchalski

ra...@gruchalski.com
de.linkedin.com/in/radgruchalski

Confidentiality:
This communication is intended for the above-named person and may be 
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor must 
you copy or show it to anyone; please delete/destroy and inform the sender 
immediately.

On May 10, 2016 at 5:56:58 PM, Paolo Patierno (ppatie...@live.com) wrote:

Yes correct ... the new restarted zookeeper instance is completely new ... it 
has no information about previous topics and brokers of course.

Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat
Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor
Twitter : @ppatierno
Linkedin : paolopatierno
Blog : DevExperience

Date: Tue, 10 May 2016 17:55:10 +0200
From: ra...@gruchalski.com
To: users@kafka.apache.org
Subject: RE: Zookeeper dies ... Kafka server unable to connect
Ah, but your retarted container does not have any data Kafka recorded previously. Correct?
–
Best regards,

Radek Gruchalski

ra...@gruchalski.com
de.linkedin.com/in/radgruchalski
Confidentiality:
This communication is intended for the above-named person and may be 
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor must 
you copy or show it to anyone; please delete/destroy and inform the sender 
immediately.
On May 10, 2016 at 5:54:09 PM, Paolo Patierno (ppatie...@live.com) wrote: This is what Kubernetes says me ... Name: zookeeper
Namespace: default
Labels: <none>
Selector: name=zookeeper
Type: ClusterIP
IP: 10.0.0.184
Port: zookeeper 2181/TCP
Endpoints: 172.17.0.4:2181
Session Affinity: None
So the address is always 10.0.0.184. From the log I understand that the creash is released to the zookeeper pod I closed ... so kafka server lost connection to it.
Starting from there they should be the attempts to connect to the new zookeeper 
that is up and running with same IP address as the previous one.
Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat
Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor
Twitter : @ppatierno
Linkedin : paolopatierno
Blog : DevExperience
Date: Tue, 10 May 2016 17:49:59 +0200
From: ra...@gruchalski.com
To: users@kafka.apache.org
Subject: Re: Zookeeper dies ... Kafka server unable to connect
Are you sure you’re getting the same IP address?
Regarding zookeeper connection being closed, is kubernetes doing a soft 
shutdown of your container? If so, zookeeper is asked politely to stop.
–
Best regards,

Radek Gruchalski

radek@gruchalski.commailto:ra...@gruchalski.com
de.linkedin.com/in/radgruchalski
+4917685656526

Confidentiality:
This communication is intended for the above-named person and may be 
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor must 
you copy or show it to anyone; please delete/destroy and inform the sender 
immediately.
On May 10, 2016 at 5:47:24 PM, Paolo Patierno (ppatie...@live.com) wrote: Hi all, experiencing with Kafka on Kubernetes I have the following error on Kafka server reconnection ... A cluster with one zookeeper and two kafka server ... I turn off the zookeeper pod but kubernetes restart it and guaratees the same IP address for it but the kafka server starts to retry connection failing with following trace : [2016-05-10 15:40:55,046] WARN Session 0x1549b308dd20002 for server 10.0.0.184/10.0.0.184:2181, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
[2016-05-10 15:40:55,149] INFO zookeeper state changed (Disconnected) 
(org.I0Itec.zkclient.ZkClient)
[2016-05-10 15:40:57,093] INFO Opening socket connection to server 
10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL 
(unknown error) (org.apache.zookeeper.ClientCnxn)
[2016-05-10 15:40:57,093] INFO Socket connection established to 
10.0.0.184/10.0.0.184:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2016-05-10 15:40:57,158] INFO Unable to read additional data from server 
sessionid 0x1549b308dd20002, likely server has closed socket, closing socket 
connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2016-05-10 15:40:58,936] INFO Opening socket connection to server 
10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL 
(unknown error) (org.apache.zookeeper.ClientCnxn)
[2016-05-10 15:40:58,936] INFO Socket connection established to 
10.0.0.184/10.0.0.184:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2016-05-10 15:40:58,937] INFO Unable to read additional data from server 
sessionid 0x1549b308dd20002, likely server has closed socket, closing socket 
connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2016-05-10 15:41:00,845] INFO Opening socket connection to server 
10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL 
(unknown error) (org.apache.zookeeper.ClientCnxn)
[2016-05-10 15:41:00,845] INFO Socket connection established to 
10.0.0.184/10.0.0.184:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2016-05-10 15:41:00,846] INFO Unable to read additional data from server 
sessionid 0x1549b308dd20002, likely server has closed socket, closing socket 
connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2016-05-10 15:41:02,071] INFO Opening socket connection to server 
10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL 
(unknown error) (org.apache.zookeeper.ClientCnxn)
[2016-05-10 15:41:02,071] INFO Socket connection established to 
10.0.0.184/10.0.0.184:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2016-05-10 15:41:02,072] INFO Unable to read additional data from server 
sessionid 0x1549b308dd20002, likely server has closed socket, closing socket 
connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2016-05-10 15:41:03,336] INFO Opening socket connection to server 
10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL 
(unknown error) (org.apache.zookeeper.ClientCnxn)
[2016-05-10 15:41:03,336] INFO Socket connection established to 
10.0.0.184/10.0.0.184:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2016-05-10 15:41:03,337] INFO Unable to read additional data from server 
sessionid 0x1549b308dd20002, likely server has closed socket, closing socket 
connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2016-05-10 15:41:05,121] INFO Opening socket connection to server 
10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL 
(unknown error) (org.apache.zookeeper.ClientCnxn)
[2016-05-10 15:41:05,121] INFO Socket connection established to 
10.0.0.184/10.0.0.184:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2016-05-10 15:41:05,122] INFO Unable to read additional data from server 
sessionid 0x1549b308dd20002, likely server has closed socket, closing socket 
connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
You can see when the first zookeeper dies and connection is lost ... and all the retries by kafka server in order to connect to the new one (same IP, same port). Why the zookeeper server closes the connection (I can see FIN ACK frames on Wireshark) Thanks,
Paolo.
Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat
Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor
Twitter : @ppatierno
Linkedin : paolopatierno
Blog : DevExperience

Reply via email to