Re: zookeeper session issue with 3.5.x version

2020-11-09 Thread vikramark s
Hi Mate,

Thanks for replying. I was able to fix my issue after reading this below
defect's comment area:

https://issues.apache.org/jira/browse/ZOOKEEPER-2164?focusedCommentId=17032546=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17032546


I was using 0.0.0.0 as one of the servers for the local host server. It was
actually described that way in the zookeeper docker image. After changing
it to the FQDN host, my issue was resolved.


Thanks,
Vik

On Mon, Nov 9, 2020 at 2:13 AM Szalay-Bekő Máté 
wrote:

> Hello Vik,
>
> This issue reminds me of
> https://issues.apache.org/jira/browse/ZOOKEEPER-3940
> Can you doublecheck if you see the same issue? I think ZOOKEEPER-3940 is
> docker related. Are you using a dockerized ZooKeeper?
>
> If you have a different problem, then I recommend you to file a Jira
> ticket, attaching debug logs from all the 3 ZooKeeper server processes.
>
> Kind regards,
> Mate
>
> On Sat, Nov 7, 2020 at 9:28 PM vikramark s 
> wrote:
>
> > Hi,
> >
> > I am relatively new to zookeeper and I am struggling to resolve an issue
> we
> > are experiencing. We have recently upgraded our zookeeper version from
> > 3.4.x to 3.5.8. We are experiencing some issues which we think are
> related
> > to session sharing among nodes.
> >
> > I was able to recreate the issue with a sample zookeeper setup. I am not
> > able to set up new session after taking down the leader in a 3 node
> > cluster. The same flow works with 3.4.14 zookeeper but not with 3.5.8. I
> am
> > hoping maybe there is some setting I am overlooking here as I don't find
> > anyone complaining about this online.
> >
> > Below are the details:
> >
> > 3 node cluster. After starting all the zoo nodes:
> >
> > Zoo1
> >
> > Zoo2
> >
> > Zoo3
> >
> > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built
> on
> > 05/04/2020 15:07 GMT
> >
> > Latency min/avg/max: 0/0/0
> >
> > Received: 3
> >
> > Sent: 2
> >
> > Connections: 1
> >
> > Outstanding: 0
> >
> > Zxid: 0x0
> >
> > Mode: follower
> >
> > Node count: 5
> >
> > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built
> on
> > 05/04/2020 15:07 GMT
> >
> > Latency min/avg/max: 0/0/0
> >
> > Received: 3
> >
> > Sent: 2
> >
> > Connections: 1
> >
> > Outstanding: 0
> >
> > Zxid: 0x1
> >
> > Mode: leader
> >
> > Node count: 5
> >
> > Proposal sizes last/min/max: -1/-1/-1
> >
> > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built
> on
> > 05/04/2020 15:07 GMT
> >
> > Latency min/avg/max: 0/0/0
> >
> > Received: 2
> >
> > Sent: 1
> >
> > Connections: 1
> >
> > Outstanding: 0
> >
> > Zxid: 0x1
> >
> > Mode: follower
> >
> > Node count: 5
> >
> >
> >
> >
> >
> > After starting one session using zkCli.sh on Zoo1 node:
> >
> >
> >
> > Zoo1
> >
> > Zoo2
> >
> > Zoo3
> >
> > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built
> on
> > 05/04/2020 15:07 GMT
> >
> > Latency min/avg/max: 1/9/23
> >
> > Received: 7
> >
> > Sent: 6
> >
> > Connections: 2
> >
> > Outstanding: 0
> >
> > Zxid: 0x10001
> >
> > Mode: follower
> >
> > Node count: 5
> >
> > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built
> on
> > 05/04/2020 15:07 GMT
> >
> > Latency min/avg/max: 0/0/0
> >
> > Received: 4
> >
> > Sent: 3
> >
> > Connections: 1
> >
> > Outstanding: 0
> >
> > Zxid: 0x10001
> >
> > Mode: leader
> >
> > Node count: 5
> >
> > Proposal sizes last/min/max: 36/36/36
> >
> > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built
> on
> > 05/04/2020 15:07 GMT
> >
> > Latency min/avg/max: 0/0/0
> >
> > Received: 3
> >
> > Sent: 2
> >
> > Connections: 1
> >
> > Outstanding: 0
> >
> > Zxid: 0x10001
> >
> > Mode: follower
> >
> > Node count: 5
> >
> >
> >
> >
> >
> > *Note: We can see that Zxid is now consistent across all nodes. *
> >
> >
> >
> > I then shut down leader node zoo2. I can see ZOO3 became the Leader. But
> > for some reason the ZXID is not the same between zoo1 and zoo3.
> >
> >
> >
> > Now closed the existing zkCli and started a new zkCli.sh session on the
> > same node (zoo1).  The session was not established, the cli client just
> > keeps retrying and created many outstanding requests on zoo1.  The only
> way
> > to resolve now is to shut down all nodes and restart them again.
> > (Currently, if the leader node goes down, our kafka cluster stops
> working.
> > )
> >
> >
> >
> > Zoo1
> >
> > Zoo2
> >
> > Zoo3
> >
> > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built
> on
> > 05/04/2020 15:07 GMT
> >
> > Latency min/avg/max: 0/0/2
> >
> > Received: 50
> >
> > Sent: 43
> >
> > Connections: 2
> >
> > Outstanding: 6
> >
> > Zxid: 0x10001
> >
> > Mode: follower
> >
> > Node count: 5
> >
> > down
> >
> > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built
> on
> > 05/04/2020 15:07 GMT
> >
> > Latency min/avg/max: 0/0/0
> >
> > Received: 1
> >
> > Sent: 0
> >
> > Connections: 1
> >
> > 

Re: zookeeper session issue with 3.5.x version

2020-11-09 Thread Szalay-Bekő Máté
Hello Vik,

This issue reminds me of
https://issues.apache.org/jira/browse/ZOOKEEPER-3940
Can you doublecheck if you see the same issue? I think ZOOKEEPER-3940 is
docker related. Are you using a dockerized ZooKeeper?

If you have a different problem, then I recommend you to file a Jira
ticket, attaching debug logs from all the 3 ZooKeeper server processes.

Kind regards,
Mate

On Sat, Nov 7, 2020 at 9:28 PM vikramark s 
wrote:

> Hi,
>
> I am relatively new to zookeeper and I am struggling to resolve an issue we
> are experiencing. We have recently upgraded our zookeeper version from
> 3.4.x to 3.5.8. We are experiencing some issues which we think are related
> to session sharing among nodes.
>
> I was able to recreate the issue with a sample zookeeper setup. I am not
> able to set up new session after taking down the leader in a 3 node
> cluster. The same flow works with 3.4.14 zookeeper but not with 3.5.8. I am
> hoping maybe there is some setting I am overlooking here as I don't find
> anyone complaining about this online.
>
> Below are the details:
>
> 3 node cluster. After starting all the zoo nodes:
>
> Zoo1
>
> Zoo2
>
> Zoo3
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 3
>
> Sent: 2
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x0
>
> Mode: follower
>
> Node count: 5
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 3
>
> Sent: 2
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x1
>
> Mode: leader
>
> Node count: 5
>
> Proposal sizes last/min/max: -1/-1/-1
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 2
>
> Sent: 1
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x1
>
> Mode: follower
>
> Node count: 5
>
>
>
>
>
> After starting one session using zkCli.sh on Zoo1 node:
>
>
>
> Zoo1
>
> Zoo2
>
> Zoo3
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 1/9/23
>
> Received: 7
>
> Sent: 6
>
> Connections: 2
>
> Outstanding: 0
>
> Zxid: 0x10001
>
> Mode: follower
>
> Node count: 5
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 4
>
> Sent: 3
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x10001
>
> Mode: leader
>
> Node count: 5
>
> Proposal sizes last/min/max: 36/36/36
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 3
>
> Sent: 2
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x10001
>
> Mode: follower
>
> Node count: 5
>
>
>
>
>
> *Note: We can see that Zxid is now consistent across all nodes. *
>
>
>
> I then shut down leader node zoo2. I can see ZOO3 became the Leader. But
> for some reason the ZXID is not the same between zoo1 and zoo3.
>
>
>
> Now closed the existing zkCli and started a new zkCli.sh session on the
> same node (zoo1).  The session was not established, the cli client just
> keeps retrying and created many outstanding requests on zoo1.  The only way
> to resolve now is to shut down all nodes and restart them again.
> (Currently, if the leader node goes down, our kafka cluster stops working.
> )
>
>
>
> Zoo1
>
> Zoo2
>
> Zoo3
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/2
>
> Received: 50
>
> Sent: 43
>
> Connections: 2
>
> Outstanding: 6
>
> Zxid: 0x10001
>
> Mode: follower
>
> Node count: 5
>
> down
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 1
>
> Sent: 0
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x2
>
> Mode: leader
>
> Node count: 5
>
> Proposal sizes last/min/max: -1/-1/-1
>
>
>
> *Question: Why is the client not able to establish the session on Zoo1 ? *
>
>
>
>
>
> But a similar flow with zookeeper 3.4.14 works fine. Below is the detail:
>
>
>
> First initial setup:
>
>
>
> Zoo1
>
> Zoo2
>
> Zoo3
>
> Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built
> on 03/06/2019 16:18 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 1
>
> Sent: 0
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x0
>
> Mode: follower
>
> Node count: 4
>
> Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built
> on 03/06/2019 16:18 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 1
>
> Sent: 0
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x1
>
> Mode: leader
>
> Node count: 4
>
> Proposal sizes last/min/max: -1/-1/-1
>
> Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built
> on 03/06/2019 16:18 GMT
>
> Latency min/avg/max: 0/0/0
>