Re: Strange zoo.cfg.dynamic.next generated via zookeeper Docker image
> It seems Zookeeper is rebinding the client port to the announced IP during the startup sequence. this is strange... According to the documentation ( https://zookeeper.apache.org/doc/r3.6.1/zookeeperReconfig.html): A client port of a server is the port on which the server accepts client connection requests. Starting with 3.5.0 the clientPort and clientPortAddress configuration parameters should no longer be used. Instead, this information is now part of the server keyword specification, which becomes as follows: server. = ::[:role];[:] So I would expect, this should work: ZOO_CFG_EXTRA="quorumListenOnAllIPs=true" ZOO_SERVERS=server.1=x.x.x.1:2888:3888:participant;0.0.0.0:2181 \ server.2=x.x.x.2:2888:3888:participant;0.0.0.0:2181 \ server.3=x.x.x.3:2888:3888:participant;0.0.0.0:2181 Although the should default to 0.0.0.0 anyway. But based on your logs I think you are right. A reconfig of the clientAddress is happening in the code here: https://github.com/apache/zookeeper/blob/1c41e127537f66842515ccb21fb48f1670003454/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L2194 One thing you can try is to switch to Netty instead of NIO, as the Netty reconfig code contains some extra 0.0.0.0 related checks. You can do that e.g. by providing the following environment variable for docker: JVMFLAGS="-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory" Also AFAICS the clientAddress is reconfigured only during the processing of dynamic reconfig. Are you using the dynamic reconfig feature actually? If not, then disabling it can fix the issue. Dynamic reconfig should be disabled by default in ZooKeeper, although I'm not sure about the docker image config. Maybe trying this? ZOO_CFG_EXTRA="quorumListenOnAllIPs=true reconfigEnabled=false" If these don't help, then can you share debug logs from one of your containers? Kind regards, Mate On Fri, Jul 24, 2020 at 6:15 PM Thilo-Alexander Ginkel wrote: > On Mon, Jul 20, 2020 at 2:29 PM Szalay-Bekő Máté > wrote: > > Can you try to change your configs by not using 0.0.0.0 in the > ZOO_SERVERS? > > Using 0.0.0.0 is not a recommended config since 3.5. If the java process > > can not bind (due to some virtual network issue) to the host provided in > > it's config, then you can use the quorumListenOnAllIPs parameter. > > > > So you should have the very same configuration for all nodes in your > > cluster, like: > > > > ZOO_CFG_EXTRA="quorumListenOnAllIPs=true" > > ZOO_SERVERS=server.1=x.x.x.1:2888:3888:participant;2181 \ > > server.2=x.x.x.2:2888:3888:participant;2181 \ > > server.3=x.x.x.3:2888:3888:participant;2181 > > That works, except that I cannot get the client port (2181) to listen > on 0.0.0.0 so it can be mapped to the outside. Any idea how to achieve > that? > > -- 8< -- > 2020-07-24 16:04:59,243 [myid:] - INFO [main:QuorumPeerConfig@456] - > clientPortAddress is 0.0.0.0:2181 > 2020-07-24 16:04:59,415 [myid:2] - INFO > [main:NIOServerCnxnFactory@674] - binding to port /0.0.0.0:2181 > 2020-07-24 16:04:59,483 [myid:2] - INFO > > [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@1371 > ] > - LOOKING > 2020-07-24 16:04:59,483 [myid:2] - INFO > > [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):FastLeaderElection@944 > ] > - New election. My id = 2, proposed zxid=0xb > 2020-07-24 16:04:59,531 [myid:2] - INFO > [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181 > :NIOServerCnxnFactory$AcceptThread@209] > - accept thread exitted run method > 2020-07-24 16:04:59,531 [myid:2] - INFO > [WorkerReceiver[myid=2]:NIOServerCnxnFactory@707] - binding to port > /10.147.254.2:2181 > 2020-07-24 16:04:59,531 [myid:2] - ERROR > [WorkerReceiver[myid=2]:NIOServerCnxnFactory@713] - Error > reconfiguring client port to /10.147.254.2:2181 > -- 8< -- > > It seems Zookeeper is rebinding the client port to the announced IP > during the startup sequence. > > I also tried specifying the bind address in ZOO_SERVER as well as > through clientPortAddress=0.0.0.0 - without any luck. > > > BTW, unfortunately there is no such thing as "official zookeeper Docker > > image", at least it is not maintained by the Apache ZooKeeper community. > (I > > don't know who is maintaining the image on dockerHub > > https://hub.docker.com/_/zookeeper - it would be nice to ask them to > update > > their examples / documentation) > > I'll open a PR once I get this sorted out. ;-) > > Thanks, > Thilo >
Re: Strange zoo.cfg.dynamic.next generated via zookeeper Docker image
On Mon, Jul 20, 2020 at 2:29 PM Szalay-Bekő Máté wrote: > Can you try to change your configs by not using 0.0.0.0 in the ZOO_SERVERS? > Using 0.0.0.0 is not a recommended config since 3.5. If the java process > can not bind (due to some virtual network issue) to the host provided in > it's config, then you can use the quorumListenOnAllIPs parameter. > > So you should have the very same configuration for all nodes in your > cluster, like: > > ZOO_CFG_EXTRA="quorumListenOnAllIPs=true" > ZOO_SERVERS=server.1=x.x.x.1:2888:3888:participant;2181 \ > server.2=x.x.x.2:2888:3888:participant;2181 \ > server.3=x.x.x.3:2888:3888:participant;2181 That works, except that I cannot get the client port (2181) to listen on 0.0.0.0 so it can be mapped to the outside. Any idea how to achieve that? -- 8< -- 2020-07-24 16:04:59,243 [myid:] - INFO [main:QuorumPeerConfig@456] - clientPortAddress is 0.0.0.0:2181 2020-07-24 16:04:59,415 [myid:2] - INFO [main:NIOServerCnxnFactory@674] - binding to port /0.0.0.0:2181 2020-07-24 16:04:59,483 [myid:2] - INFO [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@1371] - LOOKING 2020-07-24 16:04:59,483 [myid:2] - INFO [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):FastLeaderElection@944] - New election. My id = 2, proposed zxid=0xb 2020-07-24 16:04:59,531 [myid:2] - INFO [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181:NIOServerCnxnFactory$AcceptThread@209] - accept thread exitted run method 2020-07-24 16:04:59,531 [myid:2] - INFO [WorkerReceiver[myid=2]:NIOServerCnxnFactory@707] - binding to port /10.147.254.2:2181 2020-07-24 16:04:59,531 [myid:2] - ERROR [WorkerReceiver[myid=2]:NIOServerCnxnFactory@713] - Error reconfiguring client port to /10.147.254.2:2181 -- 8< -- It seems Zookeeper is rebinding the client port to the announced IP during the startup sequence. I also tried specifying the bind address in ZOO_SERVER as well as through clientPortAddress=0.0.0.0 - without any luck. > BTW, unfortunately there is no such thing as "official zookeeper Docker > image", at least it is not maintained by the Apache ZooKeeper community. (I > don't know who is maintaining the image on dockerHub > https://hub.docker.com/_/zookeeper - it would be nice to ask them to update > their examples / documentation) I'll open a PR once I get this sorted out. ;-) Thanks, Thilo
Re: Strange zoo.cfg.dynamic.next generated via zookeeper Docker image
Hello, Can you try to change your configs by not using 0.0.0.0 in the ZOO_SERVERS? Using 0.0.0.0 is not a recommended config since 3.5. If the java process can not bind (due to some virtual network issue) to the host provided in it's config, then you can use the quorumListenOnAllIPs parameter. So you should have the very same configuration for all nodes in your cluster, like: ZOO_CFG_EXTRA="quorumListenOnAllIPs=true" ZOO_SERVERS=server.1=x.x.x.1:2888:3888:participant;2181 \ server.2=x.x.x.2:2888:3888:participant;2181 \ server.3=x.x.x.3:2888:3888:participant;2181 This should have the effect that every server is binding on 0.0.0.0 locally, yet still having a consistent view of the server hostnames. BTW, unfortunately there is no such thing as "official zookeeper Docker image", at least it is not maintained by the Apache ZooKeeper community. (I don't know who is maintaining the image on dockerHub https://hub.docker.com/_/zookeeper - it would be nice to ask them to update their examples / documentation) Kind regards, Mate On Thu, Jul 16, 2020 at 9:27 AM Thilo-Alexander Ginkel wrote: > Hello again, > > just figured out that my rolling restart problems may be caused by > ZOOKEEPER-3829 (c.f. https://github.com/apache/zookeeper/pull/1356), > so I tried to set reconfigEnabled=true as a workaround, but that fails > as Zookeeper attempts to bind to x.x.x.1 instead of 0.0.0.0 (config > still lists 0.0.0.0 for the local node, respectively) during startup > in that case, so that's apparently not feasible in a Docker > environment: > > 2020-07-16 07:22:20,141 [myid:1] - ERROR > > [ListenerHandler-/x.x.x.1:3888:QuorumCnxManager$Listener$ListenerHandler@1093 > ] > - Exception while listening > java.net.BindException: Cannot assign requested address (Bind failed) > at java.base/java.net.PlainSocketImpl.socketBind(Native Method) > at java.base/java.net.AbstractPlainSocketImpl.bind(Unknown Source) > at java.base/java.net.ServerSocket.bind(Unknown Source) > at java.base/java.net.ServerSocket.bind(Unknown Source) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.createNewServerSocket(QuorumCnxManager.java:1134) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.acceptConnections(QuorumCnxManager.java:1064) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.run(QuorumCnxManager.java:1033) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown > Source) > at java.base/java.util.concurrent.FutureTask.run(Unknown Source) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > 2020-07-16 07:22:21,143 [myid:1] - ERROR > > [ListenerHandler-/x.x.x.1:3888:QuorumCnxManager$Listener$ListenerHandler@1112 > ] > - Leaving listener thread for address 10.147.254.1:3888 after 3 > errors. Use zookeeper.electionPortBindRetry property to increase retry > count. > > Are there any plans to release 3.6.2 including the above fix? > > Regards, > Thilo >
Re: Strange zoo.cfg.dynamic.next generated via zookeeper Docker image
Hello again, just figured out that my rolling restart problems may be caused by ZOOKEEPER-3829 (c.f. https://github.com/apache/zookeeper/pull/1356), so I tried to set reconfigEnabled=true as a workaround, but that fails as Zookeeper attempts to bind to x.x.x.1 instead of 0.0.0.0 (config still lists 0.0.0.0 for the local node, respectively) during startup in that case, so that's apparently not feasible in a Docker environment: 2020-07-16 07:22:20,141 [myid:1] - ERROR [ListenerHandler-/x.x.x.1:3888:QuorumCnxManager$Listener$ListenerHandler@1093] - Exception while listening java.net.BindException: Cannot assign requested address (Bind failed) at java.base/java.net.PlainSocketImpl.socketBind(Native Method) at java.base/java.net.AbstractPlainSocketImpl.bind(Unknown Source) at java.base/java.net.ServerSocket.bind(Unknown Source) at java.base/java.net.ServerSocket.bind(Unknown Source) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.createNewServerSocket(QuorumCnxManager.java:1134) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.acceptConnections(QuorumCnxManager.java:1064) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.run(QuorumCnxManager.java:1033) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) 2020-07-16 07:22:21,143 [myid:1] - ERROR [ListenerHandler-/x.x.x.1:3888:QuorumCnxManager$Listener$ListenerHandler@1112] - Leaving listener thread for address 10.147.254.1:3888 after 3 errors. Use zookeeper.electionPortBindRetry property to increase retry count. Are there any plans to release 3.6.2 including the above fix? Regards, Thilo
Strange zoo.cfg.dynamic.next generated via zookeeper Docker image
Hi there, I am running a three-node Zookeeper cluster based on the official zookeeper Docker image (currently at v3.6.1). I have been seeing sporadic problems during a rolling restart where the ensemble often loses its integrity requiring all nodes to be stopped and restarted to recover. The containers are connected to the bridge network, so I need to replace each node's own IP with 0.0.0.0 in the server declaration, as in (for server #1): ZOO_MY_ID=1 ZOO_SERVERS=server.1=0.0.0.0:2888:3888:participant;2181 \ server.2=x.x.x.2:2888:3888:participant;2181 \ server.3=x.x.x.3:2888:3888:participant;2181 Server #2, #3 have their IP replaced with 0.0.0.0, respectively. >From this configuration Zookeeper seems to generate the following zoo.cfg.dynamic.next config file (identical on all three servers), which is somewhat surprising as it contains 0.0.0.0 as address of server #2: -- 8< -- server.1=x.x.x.1:2888:3888:participant;0.0.0.0:2181 server.2=0.0.0.0:2888:3888:participant;0.0.0.0:2181 server.3=x.x.x.3:2888:3888:participant;0.0.0.0:2181 version=1f -- 8< -- Is this how things are supposed to be? Naively, I would have assumed that server #1/#3 should know the real ip of server #2 and not 0.0.0.0... Is there a way to configure an advertised address? If not, what is the recommended setup to operate Zookeeper within Docker (without having to resort to using the host network)? Thanks, Thilo