Re: Strange zoo.cfg.dynamic.next generated via zookeeper Docker image

2020-07-28 Thread Szalay-Bekő Máté
> It seems Zookeeper is rebinding the client port to the announced IP
during the startup sequence.

this is strange... According to the documentation (
https://zookeeper.apache.org/doc/r3.6.1/zookeeperReconfig.html):

A client port of a server is the port on which the server accepts client
connection requests. Starting with 3.5.0 the clientPort and
clientPortAddress configuration parameters should no longer be used.
Instead, this information is now part of the server keyword specification,
which becomes as follows:
server. = ::[:role];[:]

So I would expect, this should work:
ZOO_CFG_EXTRA="quorumListenOnAllIPs=true"
ZOO_SERVERS=server.1=x.x.x.1:2888:3888:participant;0.0.0.0:2181 \
server.2=x.x.x.2:2888:3888:participant;0.0.0.0:2181 \
server.3=x.x.x.3:2888:3888:participant;0.0.0.0:2181

Although the  should default to 0.0.0.0 anyway.

But based on your logs I think you are right. A reconfig of the
clientAddress is happening in the code here:
https://github.com/apache/zookeeper/blob/1c41e127537f66842515ccb21fb48f1670003454/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L2194

One thing you can try is to switch to Netty instead of NIO, as the Netty
reconfig code contains some extra 0.0.0.0 related checks.
You can do that e.g. by providing the following environment variable for
docker:
JVMFLAGS="-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory"

Also AFAICS the clientAddress is reconfigured only during the processing of
dynamic reconfig. Are you using the dynamic reconfig feature actually? If
not, then disabling it can fix the issue.
Dynamic reconfig should be disabled by default in ZooKeeper, although I'm
not sure about the docker image config.
Maybe trying this? ZOO_CFG_EXTRA="quorumListenOnAllIPs=true
reconfigEnabled=false"

If these don't help, then can you share debug logs from one of your
containers?

Kind regards,
Mate

On Fri, Jul 24, 2020 at 6:15 PM Thilo-Alexander Ginkel 
wrote:

> On Mon, Jul 20, 2020 at 2:29 PM Szalay-Bekő Máté
>  wrote:
> > Can you try to change your configs by not using 0.0.0.0 in the
> ZOO_SERVERS?
> > Using 0.0.0.0 is not a recommended config since 3.5. If the java process
> > can not bind (due to some virtual network issue) to the host provided in
> > it's config, then you can use the quorumListenOnAllIPs parameter.
> >
> > So you should have the very same configuration for all nodes in your
> > cluster, like:
> >
> > ZOO_CFG_EXTRA="quorumListenOnAllIPs=true"
> > ZOO_SERVERS=server.1=x.x.x.1:2888:3888:participant;2181 \
> >   server.2=x.x.x.2:2888:3888:participant;2181 \
> >   server.3=x.x.x.3:2888:3888:participant;2181
>
> That works, except that I cannot get the client port (2181) to listen
> on 0.0.0.0 so it can be mapped to the outside. Any idea how to achieve
> that?
>
> -- 8< --
> 2020-07-24 16:04:59,243 [myid:] - INFO  [main:QuorumPeerConfig@456] -
> clientPortAddress is 0.0.0.0:2181
> 2020-07-24 16:04:59,415 [myid:2] - INFO
> [main:NIOServerCnxnFactory@674] - binding to port /0.0.0.0:2181
> 2020-07-24 16:04:59,483 [myid:2] - INFO
>
> [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@1371
> ]
> - LOOKING
> 2020-07-24 16:04:59,483 [myid:2] - INFO
>
> [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):FastLeaderElection@944
> ]
> - New election. My id = 2, proposed zxid=0xb
> 2020-07-24 16:04:59,531 [myid:2] - INFO
> [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181
> :NIOServerCnxnFactory$AcceptThread@209]
> - accept thread exitted run method
> 2020-07-24 16:04:59,531 [myid:2] - INFO
> [WorkerReceiver[myid=2]:NIOServerCnxnFactory@707] - binding to port
> /10.147.254.2:2181
> 2020-07-24 16:04:59,531 [myid:2] - ERROR
> [WorkerReceiver[myid=2]:NIOServerCnxnFactory@713] - Error
> reconfiguring client port to /10.147.254.2:2181
> -- 8< --
>
> It seems Zookeeper is rebinding the client port to the announced IP
> during the startup sequence.
>
> I also tried specifying the bind address in ZOO_SERVER as well as
> through clientPortAddress=0.0.0.0 - without any luck.
>
> > BTW, unfortunately there is no such thing as "official zookeeper Docker
> > image", at least it is not maintained by the Apache ZooKeeper community.
> (I
> > don't know who is maintaining the image on dockerHub
> > https://hub.docker.com/_/zookeeper - it would be nice to ask them to
> update
> > their examples / documentation)
>
> I'll open a PR once I get this sorted out. ;-)
>
> Thanks,
> Thilo
>


Re: Strange zoo.cfg.dynamic.next generated via zookeeper Docker image

2020-07-24 Thread Thilo-Alexander Ginkel
On Mon, Jul 20, 2020 at 2:29 PM Szalay-Bekő Máté
 wrote:
> Can you try to change your configs by not using 0.0.0.0 in the ZOO_SERVERS?
> Using 0.0.0.0 is not a recommended config since 3.5. If the java process
> can not bind (due to some virtual network issue) to the host provided in
> it's config, then you can use the quorumListenOnAllIPs parameter.
>
> So you should have the very same configuration for all nodes in your
> cluster, like:
>
> ZOO_CFG_EXTRA="quorumListenOnAllIPs=true"
> ZOO_SERVERS=server.1=x.x.x.1:2888:3888:participant;2181 \
>   server.2=x.x.x.2:2888:3888:participant;2181 \
>   server.3=x.x.x.3:2888:3888:participant;2181

That works, except that I cannot get the client port (2181) to listen
on 0.0.0.0 so it can be mapped to the outside. Any idea how to achieve
that?

-- 8< --
2020-07-24 16:04:59,243 [myid:] - INFO  [main:QuorumPeerConfig@456] -
clientPortAddress is 0.0.0.0:2181
2020-07-24 16:04:59,415 [myid:2] - INFO
[main:NIOServerCnxnFactory@674] - binding to port /0.0.0.0:2181
2020-07-24 16:04:59,483 [myid:2] - INFO
[QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@1371]
- LOOKING
2020-07-24 16:04:59,483 [myid:2] - INFO
[QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):FastLeaderElection@944]
- New election. My id = 2, proposed zxid=0xb
2020-07-24 16:04:59,531 [myid:2] - INFO
[NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181:NIOServerCnxnFactory$AcceptThread@209]
- accept thread exitted run method
2020-07-24 16:04:59,531 [myid:2] - INFO
[WorkerReceiver[myid=2]:NIOServerCnxnFactory@707] - binding to port
/10.147.254.2:2181
2020-07-24 16:04:59,531 [myid:2] - ERROR
[WorkerReceiver[myid=2]:NIOServerCnxnFactory@713] - Error
reconfiguring client port to /10.147.254.2:2181
-- 8< --

It seems Zookeeper is rebinding the client port to the announced IP
during the startup sequence.

I also tried specifying the bind address in ZOO_SERVER as well as
through clientPortAddress=0.0.0.0 - without any luck.

> BTW, unfortunately there is no such thing as "official zookeeper Docker
> image", at least it is not maintained by the Apache ZooKeeper community. (I
> don't know who is maintaining the image on dockerHub
> https://hub.docker.com/_/zookeeper - it would be nice to ask them to update
> their examples / documentation)

I'll open a PR once I get this sorted out. ;-)

Thanks,
Thilo


Re: Strange zoo.cfg.dynamic.next generated via zookeeper Docker image

2020-07-20 Thread Szalay-Bekő Máté
Hello,

Can you try to change your configs by not using 0.0.0.0 in the ZOO_SERVERS?
Using 0.0.0.0 is not a recommended config since 3.5. If the java process
can not bind (due to some virtual network issue) to the host provided in
it's config, then you can use the quorumListenOnAllIPs parameter.

So you should have the very same configuration for all nodes in your
cluster, like:

ZOO_CFG_EXTRA="quorumListenOnAllIPs=true"
ZOO_SERVERS=server.1=x.x.x.1:2888:3888:participant;2181 \
  server.2=x.x.x.2:2888:3888:participant;2181 \
  server.3=x.x.x.3:2888:3888:participant;2181

This should have the effect that every server is binding on 0.0.0.0
locally, yet still having a consistent view of the server hostnames.

BTW, unfortunately there is no such thing as "official zookeeper Docker
image", at least it is not maintained by the Apache ZooKeeper community. (I
don't know who is maintaining the image on dockerHub
https://hub.docker.com/_/zookeeper - it would be nice to ask them to update
their examples / documentation)

Kind regards,
Mate

On Thu, Jul 16, 2020 at 9:27 AM Thilo-Alexander Ginkel 
wrote:

> Hello again,
>
> just figured out that my rolling restart problems may be caused by
> ZOOKEEPER-3829 (c.f. https://github.com/apache/zookeeper/pull/1356),
> so I tried to set reconfigEnabled=true as a workaround, but that fails
> as Zookeeper attempts to bind to x.x.x.1 instead of 0.0.0.0 (config
> still lists 0.0.0.0 for the local node, respectively) during startup
> in that case, so that's apparently not feasible in a Docker
> environment:
>
> 2020-07-16 07:22:20,141 [myid:1] - ERROR
>
> [ListenerHandler-/x.x.x.1:3888:QuorumCnxManager$Listener$ListenerHandler@1093
> ]
> - Exception while listening
> java.net.BindException: Cannot assign requested address (Bind failed)
> at java.base/java.net.PlainSocketImpl.socketBind(Native Method)
> at java.base/java.net.AbstractPlainSocketImpl.bind(Unknown Source)
> at java.base/java.net.ServerSocket.bind(Unknown Source)
> at java.base/java.net.ServerSocket.bind(Unknown Source)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.createNewServerSocket(QuorumCnxManager.java:1134)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.acceptConnections(QuorumCnxManager.java:1064)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.run(QuorumCnxManager.java:1033)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown
> Source)
> at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
> at java.base/java.lang.Thread.run(Unknown Source)
> 2020-07-16 07:22:21,143 [myid:1] - ERROR
>
> [ListenerHandler-/x.x.x.1:3888:QuorumCnxManager$Listener$ListenerHandler@1112
> ]
> - Leaving listener thread for address 10.147.254.1:3888 after 3
> errors. Use zookeeper.electionPortBindRetry property to increase retry
> count.
>
> Are there any plans to release 3.6.2 including the above fix?
>
> Regards,
> Thilo
>


Re: Strange zoo.cfg.dynamic.next generated via zookeeper Docker image

2020-07-16 Thread Thilo-Alexander Ginkel
Hello again,

just figured out that my rolling restart problems may be caused by
ZOOKEEPER-3829 (c.f. https://github.com/apache/zookeeper/pull/1356),
so I tried to set reconfigEnabled=true as a workaround, but that fails
as Zookeeper attempts to bind to x.x.x.1 instead of 0.0.0.0 (config
still lists 0.0.0.0 for the local node, respectively) during startup
in that case, so that's apparently not feasible in a Docker
environment:

2020-07-16 07:22:20,141 [myid:1] - ERROR
[ListenerHandler-/x.x.x.1:3888:QuorumCnxManager$Listener$ListenerHandler@1093]
- Exception while listening
java.net.BindException: Cannot assign requested address (Bind failed)
at java.base/java.net.PlainSocketImpl.socketBind(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.bind(Unknown Source)
at java.base/java.net.ServerSocket.bind(Unknown Source)
at java.base/java.net.ServerSocket.bind(Unknown Source)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.createNewServerSocket(QuorumCnxManager.java:1134)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.acceptConnections(QuorumCnxManager.java:1064)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.run(QuorumCnxManager.java:1033)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.base/java.lang.Thread.run(Unknown Source)
2020-07-16 07:22:21,143 [myid:1] - ERROR
[ListenerHandler-/x.x.x.1:3888:QuorumCnxManager$Listener$ListenerHandler@1112]
- Leaving listener thread for address 10.147.254.1:3888 after 3
errors. Use zookeeper.electionPortBindRetry property to increase retry
count.

Are there any plans to release 3.6.2 including the above fix?

Regards,
Thilo


Strange zoo.cfg.dynamic.next generated via zookeeper Docker image

2020-07-15 Thread Thilo-Alexander Ginkel
Hi there,

I am running a three-node Zookeeper cluster based on the official
zookeeper Docker image (currently at v3.6.1). I have been seeing
sporadic problems during a rolling restart where the ensemble often
loses its integrity requiring all nodes to be stopped and restarted to
recover.

The containers are connected to the bridge network, so I need to
replace each node's own IP with 0.0.0.0 in the server declaration, as
in (for server #1):

ZOO_MY_ID=1
ZOO_SERVERS=server.1=0.0.0.0:2888:3888:participant;2181 \
  server.2=x.x.x.2:2888:3888:participant;2181 \
  server.3=x.x.x.3:2888:3888:participant;2181

Server #2, #3 have their IP replaced with 0.0.0.0, respectively.

>From this configuration Zookeeper seems to generate the following
zoo.cfg.dynamic.next config file (identical on all three servers),
which is somewhat surprising as it contains 0.0.0.0 as address of
server #2:

-- 8< --
server.1=x.x.x.1:2888:3888:participant;0.0.0.0:2181
server.2=0.0.0.0:2888:3888:participant;0.0.0.0:2181
server.3=x.x.x.3:2888:3888:participant;0.0.0.0:2181
version=1f
-- 8< --

Is this how things are supposed to be? Naively, I would have assumed
that server #1/#3 should know the real ip of server #2 and not
0.0.0.0...

Is there a way to configure an advertised address? If not, what is the
recommended setup to operate Zookeeper within Docker (without having
to resort to using the host network)?

Thanks,
Thilo