Re: Upgrading existing non-TLS cluster with no downtime
Thanks Mate for the responses. Brings a lot of clarity to me. I was able to get it working this time without downtime. Don't know what I did wrong the last time though. On Mon, 20 Jul 2020 at 12:57, Szalay-Bekő Máté wrote: > echo "stat" | nc localhost 2182Hi, > > I guess this is the part you are referring: > > https://zookeeper.apache.org/doc/r3.5.8/zookeeperAdmin.html#Upgrading+existing+nonTLS+cluster > (your link was pointing to the 3.3.2 admin guide where this chapter was > missing) > > > 1) When I set sslQuorum=true and portUnification=true on the first > server, > does it go out of the quorum? And when these properties are set in the > second server, a new quorum of first and second server is formed and now > the third server is out of quorum. When the 3rd server follows suit, it is > added back to the quorum. > > the "sslQuorum=true and portUnification=true" setting is needed in step 4 > (although the numbering is bad in the markdown...). After step 3 you > already have a 3 server quorum up with portUnification=true, meaning the > cluster can handle both TLS/SSL and regular/non-secure connections. So when > you restart server 1 with sslQuorum=true, then it will be able to re-join > to the quorum, as server 2 and 3 are capable of handling SSL connections > (even if they are not using it for connection initiation). So ideally > between restarting each servers with sslQuorum=true, you always should have > a 3 node full quorum. > > > 2) The guideline says to check after restarting every broker that the > quorum is healthy, is there any metric to track that? > > I send the "stat" command to all nodes to see if everyone is connected to > the quorum. E.g.: echo "stat" | nc localhost 2181 > I usually use 4-letter-word commands but the REST admin API works as well, > and actually that is the officially recommended way, as the 4-letter-words > are / will be deprecated some time. > For the admin server see: > https://zookeeper.apache.org/doc/r3.5.8/zookeeperAdmin.html#sc_adminserver > > Kind regards, > Mate > > On Tue, Jul 14, 2020 at 10:52 PM Sankalp Bhatia > > wrote: > > > +users > > > > On Tue, 14 Jul 2020 at 21:51, Sankalp Bhatia > > wrote: > > > > > Hi All, > > > > > > I am trying to follow the section "Upgrading existing non-TLS cluster > > with > > > no downtime" in the zookeeper guide : > > > https://zookeeper.apache.org/doc/r3.3.2/zookeeperAdmin.html > > > > > > I have an ensemble of 3 servers. I have a couple of questions: > > > > > > 1) When I set sslQuorum=true and portUnification=true on the first > > > server, does it go out of the quorum? And when these properties are set > > > in the second server, a new quorum of first and second server is formed > > and > > > now the third server is out of quorum. When the 3rd server follows > suit, > > it > > > is added back to the quorum. > > > > > > If this is the case, what is the use of a the port-unification feature > > > here? > > > > > > 2) The guideline says to check after restarting every broker that the > > > quorum is healthy, is there any metric to track that? > > > > > > Thanks, > > > Sankalp > > > > > > > > > > > > > > >
Re: Strange zoo.cfg.dynamic.next generated via zookeeper Docker image
Hello, Can you try to change your configs by not using 0.0.0.0 in the ZOO_SERVERS? Using 0.0.0.0 is not a recommended config since 3.5. If the java process can not bind (due to some virtual network issue) to the host provided in it's config, then you can use the quorumListenOnAllIPs parameter. So you should have the very same configuration for all nodes in your cluster, like: ZOO_CFG_EXTRA="quorumListenOnAllIPs=true" ZOO_SERVERS=server.1=x.x.x.1:2888:3888:participant;2181 \ server.2=x.x.x.2:2888:3888:participant;2181 \ server.3=x.x.x.3:2888:3888:participant;2181 This should have the effect that every server is binding on 0.0.0.0 locally, yet still having a consistent view of the server hostnames. BTW, unfortunately there is no such thing as "official zookeeper Docker image", at least it is not maintained by the Apache ZooKeeper community. (I don't know who is maintaining the image on dockerHub https://hub.docker.com/_/zookeeper - it would be nice to ask them to update their examples / documentation) Kind regards, Mate On Thu, Jul 16, 2020 at 9:27 AM Thilo-Alexander Ginkel wrote: > Hello again, > > just figured out that my rolling restart problems may be caused by > ZOOKEEPER-3829 (c.f. https://github.com/apache/zookeeper/pull/1356), > so I tried to set reconfigEnabled=true as a workaround, but that fails > as Zookeeper attempts to bind to x.x.x.1 instead of 0.0.0.0 (config > still lists 0.0.0.0 for the local node, respectively) during startup > in that case, so that's apparently not feasible in a Docker > environment: > > 2020-07-16 07:22:20,141 [myid:1] - ERROR > > [ListenerHandler-/x.x.x.1:3888:QuorumCnxManager$Listener$ListenerHandler@1093 > ] > - Exception while listening > java.net.BindException: Cannot assign requested address (Bind failed) > at java.base/java.net.PlainSocketImpl.socketBind(Native Method) > at java.base/java.net.AbstractPlainSocketImpl.bind(Unknown Source) > at java.base/java.net.ServerSocket.bind(Unknown Source) > at java.base/java.net.ServerSocket.bind(Unknown Source) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.createNewServerSocket(QuorumCnxManager.java:1134) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.acceptConnections(QuorumCnxManager.java:1064) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.run(QuorumCnxManager.java:1033) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown > Source) > at java.base/java.util.concurrent.FutureTask.run(Unknown Source) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > 2020-07-16 07:22:21,143 [myid:1] - ERROR > > [ListenerHandler-/x.x.x.1:3888:QuorumCnxManager$Listener$ListenerHandler@1112 > ] > - Leaving listener thread for address 10.147.254.1:3888 after 3 > errors. Use zookeeper.electionPortBindRetry property to increase retry > count. > > Are there any plans to release 3.6.2 including the above fix? > > Regards, > Thilo >
Re: Upgrading existing non-TLS cluster with no downtime
echo "stat" | nc localhost 2182Hi, I guess this is the part you are referring: https://zookeeper.apache.org/doc/r3.5.8/zookeeperAdmin.html#Upgrading+existing+nonTLS+cluster (your link was pointing to the 3.3.2 admin guide where this chapter was missing) > 1) When I set sslQuorum=true and portUnification=true on the first server, does it go out of the quorum? And when these properties are set in the second server, a new quorum of first and second server is formed and now the third server is out of quorum. When the 3rd server follows suit, it is added back to the quorum. the "sslQuorum=true and portUnification=true" setting is needed in step 4 (although the numbering is bad in the markdown...). After step 3 you already have a 3 server quorum up with portUnification=true, meaning the cluster can handle both TLS/SSL and regular/non-secure connections. So when you restart server 1 with sslQuorum=true, then it will be able to re-join to the quorum, as server 2 and 3 are capable of handling SSL connections (even if they are not using it for connection initiation). So ideally between restarting each servers with sslQuorum=true, you always should have a 3 node full quorum. > 2) The guideline says to check after restarting every broker that the quorum is healthy, is there any metric to track that? I send the "stat" command to all nodes to see if everyone is connected to the quorum. E.g.: echo "stat" | nc localhost 2181 I usually use 4-letter-word commands but the REST admin API works as well, and actually that is the officially recommended way, as the 4-letter-words are / will be deprecated some time. For the admin server see: https://zookeeper.apache.org/doc/r3.5.8/zookeeperAdmin.html#sc_adminserver Kind regards, Mate On Tue, Jul 14, 2020 at 10:52 PM Sankalp Bhatia wrote: > +users > > On Tue, 14 Jul 2020 at 21:51, Sankalp Bhatia > wrote: > > > Hi All, > > > > I am trying to follow the section "Upgrading existing non-TLS cluster > with > > no downtime" in the zookeeper guide : > > https://zookeeper.apache.org/doc/r3.3.2/zookeeperAdmin.html > > > > I have an ensemble of 3 servers. I have a couple of questions: > > > > 1) When I set sslQuorum=true and portUnification=true on the first > > server, does it go out of the quorum? And when these properties are set > > in the second server, a new quorum of first and second server is formed > and > > now the third server is out of quorum. When the 3rd server follows suit, > it > > is added back to the quorum. > > > > If this is the case, what is the use of a the port-unification feature > > here? > > > > 2) The guideline says to check after restarting every broker that the > > quorum is healthy, is there any metric to track that? > > > > Thanks, > > Sankalp > > > > > > > > >
Re: Zookeeper session expiration
Currently our production is running with 3.5.5 and it will take time to move to 3.6.1. When I dig more into this it seems to be related to Netty protocol and it’s limitation. The system is stable when I fail back to NIO and without SSL. As soon as I turned on Netty we are seeing sessions getting throttled which in turn sometimes throttles the ping request too from clients. I believe we should get an option to configure Netty in such a way that ping commands are never throttled. Thanks Srikant Kalani On Mon, 20 Jul 2020 at 7:02 PM, Szalay-Bekő Máté wrote: > Hello, > > can you reproduce the problem with the latest 3.5 version? I mean 3.5.8. > There were a few bugfixes recently that can help. e.g.: > https://issues.apache.org/jira/browse/ZOOKEEPER-3756 > Also you can try to increase some timeout parameters, see > > https://zookeeper.apache.org/doc/r3.5.8/zookeeperAdmin.html#sc_configuration > (like minSessionTimeout, maxSessionTimeout, syncLimit) > > Kind regards, > Mate > > On Mon, Jul 13, 2020 at 5:19 PM Srikant Kalani > wrote: > > > I am facing a similar issue in my application. > > > > Zookeeper Server Version 3.5.5 > > > > I implemented SSL ( server to server ) in quorum communication. > > > > After that ZK client frequently receives session timeouts. > > > > When I turned off SSL then application is behaving normally and there are > > no > > timeouts. > > > > Any thoughts ? > > > > Thanks > > Srikant Kalani > > > > > > > > -- > > Sent from: http://zookeeper-user.578899.n2.nabble.com/ > > >
Re: Zookeeper session expiration
Hello, can you reproduce the problem with the latest 3.5 version? I mean 3.5.8. There were a few bugfixes recently that can help. e.g.: https://issues.apache.org/jira/browse/ZOOKEEPER-3756 Also you can try to increase some timeout parameters, see https://zookeeper.apache.org/doc/r3.5.8/zookeeperAdmin.html#sc_configuration (like minSessionTimeout, maxSessionTimeout, syncLimit) Kind regards, Mate On Mon, Jul 13, 2020 at 5:19 PM Srikant Kalani wrote: > I am facing a similar issue in my application. > > Zookeeper Server Version 3.5.5 > > I implemented SSL ( server to server ) in quorum communication. > > After that ZK client frequently receives session timeouts. > > When I turned off SSL then application is behaving normally and there are > no > timeouts. > > Any thoughts ? > > Thanks > Srikant Kalani > > > > -- > Sent from: http://zookeeper-user.578899.n2.nabble.com/ >