Re: Upgrading existing non-TLS cluster with no downtime

2020-07-20 Thread Sankalp Bhatia
Thanks Mate for the responses. Brings a lot of clarity to me. I was able to
get it working this time without downtime. Don't know what I did wrong the
last time though.

On Mon, 20 Jul 2020 at 12:57, Szalay-Bekő Máté 
wrote:

> echo "stat" | nc localhost 2182Hi,
>
> I guess this is the part you are referring:
>
> https://zookeeper.apache.org/doc/r3.5.8/zookeeperAdmin.html#Upgrading+existing+nonTLS+cluster
> (your link was pointing to the 3.3.2 admin guide where this chapter was
> missing)
>
> > 1) When I set sslQuorum=true  and portUnification=true on the first
> server,
> does it go out of the quorum? And when these properties are set in the
> second server, a new quorum of first and second server is formed and now
> the third server is out of quorum. When the 3rd server follows suit, it is
> added back to the quorum.
>
> the "sslQuorum=true  and portUnification=true" setting is needed in step 4
> (although the numbering is bad in the markdown...). After step 3 you
> already have a 3 server quorum up with portUnification=true, meaning the
> cluster can handle both TLS/SSL and regular/non-secure connections. So when
> you restart server 1 with sslQuorum=true, then it will be able to re-join
> to the quorum, as server 2 and 3 are capable of handling SSL connections
> (even if they are not using it for connection initiation). So ideally
> between restarting each servers with sslQuorum=true, you always should have
> a 3 node full quorum.
>
> > 2) The guideline says to check after restarting every broker that the
> quorum is healthy, is there any metric to track that?
>
> I send the "stat" command to all nodes to see if everyone is connected to
> the quorum. E.g.: echo "stat" | nc localhost 2181
> I usually use 4-letter-word commands but the REST admin API works as well,
> and actually that is the officially recommended way, as the 4-letter-words
> are / will be deprecated some time.
> For the admin server see:
> https://zookeeper.apache.org/doc/r3.5.8/zookeeperAdmin.html#sc_adminserver
>
> Kind regards,
> Mate
>
> On Tue, Jul 14, 2020 at 10:52 PM Sankalp Bhatia  >
> wrote:
>
> > +users
> >
> > On Tue, 14 Jul 2020 at 21:51, Sankalp Bhatia 
> > wrote:
> >
> > > Hi All,
> > >
> > > I am trying to follow the section "Upgrading existing non-TLS cluster
> > with
> > > no downtime" in the zookeeper guide :
> > > https://zookeeper.apache.org/doc/r3.3.2/zookeeperAdmin.html
> > >
> > > I have an ensemble of 3 servers. I have a couple of questions:
> > >
> > > 1) When I set sslQuorum=true  and portUnification=true on the first
> > > server, does it go out of the quorum? And when these properties are set
> > > in the second server, a new quorum of first and second server is formed
> > and
> > > now the third server is out of quorum. When the 3rd server follows
> suit,
> > it
> > > is added back to the quorum.
> > >
> > > If this is the case, what is the use of a the port-unification feature
> > > here?
> > >
> > > 2) The guideline says to check after restarting every broker that the
> > > quorum is healthy, is there any metric to track that?
> > >
> > > Thanks,
> > > Sankalp
> > >
> > >
> > >
> > >
> >
>


Re: Strange zoo.cfg.dynamic.next generated via zookeeper Docker image

2020-07-20 Thread Szalay-Bekő Máté
Hello,

Can you try to change your configs by not using 0.0.0.0 in the ZOO_SERVERS?
Using 0.0.0.0 is not a recommended config since 3.5. If the java process
can not bind (due to some virtual network issue) to the host provided in
it's config, then you can use the quorumListenOnAllIPs parameter.

So you should have the very same configuration for all nodes in your
cluster, like:

ZOO_CFG_EXTRA="quorumListenOnAllIPs=true"
ZOO_SERVERS=server.1=x.x.x.1:2888:3888:participant;2181 \
  server.2=x.x.x.2:2888:3888:participant;2181 \
  server.3=x.x.x.3:2888:3888:participant;2181

This should have the effect that every server is binding on 0.0.0.0
locally, yet still having a consistent view of the server hostnames.

BTW, unfortunately there is no such thing as "official zookeeper Docker
image", at least it is not maintained by the Apache ZooKeeper community. (I
don't know who is maintaining the image on dockerHub
https://hub.docker.com/_/zookeeper - it would be nice to ask them to update
their examples / documentation)

Kind regards,
Mate

On Thu, Jul 16, 2020 at 9:27 AM Thilo-Alexander Ginkel 
wrote:

> Hello again,
>
> just figured out that my rolling restart problems may be caused by
> ZOOKEEPER-3829 (c.f. https://github.com/apache/zookeeper/pull/1356),
> so I tried to set reconfigEnabled=true as a workaround, but that fails
> as Zookeeper attempts to bind to x.x.x.1 instead of 0.0.0.0 (config
> still lists 0.0.0.0 for the local node, respectively) during startup
> in that case, so that's apparently not feasible in a Docker
> environment:
>
> 2020-07-16 07:22:20,141 [myid:1] - ERROR
>
> [ListenerHandler-/x.x.x.1:3888:QuorumCnxManager$Listener$ListenerHandler@1093
> ]
> - Exception while listening
> java.net.BindException: Cannot assign requested address (Bind failed)
> at java.base/java.net.PlainSocketImpl.socketBind(Native Method)
> at java.base/java.net.AbstractPlainSocketImpl.bind(Unknown Source)
> at java.base/java.net.ServerSocket.bind(Unknown Source)
> at java.base/java.net.ServerSocket.bind(Unknown Source)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.createNewServerSocket(QuorumCnxManager.java:1134)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.acceptConnections(QuorumCnxManager.java:1064)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.run(QuorumCnxManager.java:1033)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown
> Source)
> at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
> at java.base/java.lang.Thread.run(Unknown Source)
> 2020-07-16 07:22:21,143 [myid:1] - ERROR
>
> [ListenerHandler-/x.x.x.1:3888:QuorumCnxManager$Listener$ListenerHandler@1112
> ]
> - Leaving listener thread for address 10.147.254.1:3888 after 3
> errors. Use zookeeper.electionPortBindRetry property to increase retry
> count.
>
> Are there any plans to release 3.6.2 including the above fix?
>
> Regards,
> Thilo
>


Re: Upgrading existing non-TLS cluster with no downtime

2020-07-20 Thread Szalay-Bekő Máté
echo "stat" | nc localhost 2182Hi,

I guess this is the part you are referring:
https://zookeeper.apache.org/doc/r3.5.8/zookeeperAdmin.html#Upgrading+existing+nonTLS+cluster
(your link was pointing to the 3.3.2 admin guide where this chapter was
missing)

> 1) When I set sslQuorum=true  and portUnification=true on the first
server,
does it go out of the quorum? And when these properties are set in the
second server, a new quorum of first and second server is formed and now
the third server is out of quorum. When the 3rd server follows suit, it is
added back to the quorum.

the "sslQuorum=true  and portUnification=true" setting is needed in step 4
(although the numbering is bad in the markdown...). After step 3 you
already have a 3 server quorum up with portUnification=true, meaning the
cluster can handle both TLS/SSL and regular/non-secure connections. So when
you restart server 1 with sslQuorum=true, then it will be able to re-join
to the quorum, as server 2 and 3 are capable of handling SSL connections
(even if they are not using it for connection initiation). So ideally
between restarting each servers with sslQuorum=true, you always should have
a 3 node full quorum.

> 2) The guideline says to check after restarting every broker that the
quorum is healthy, is there any metric to track that?

I send the "stat" command to all nodes to see if everyone is connected to
the quorum. E.g.: echo "stat" | nc localhost 2181
I usually use 4-letter-word commands but the REST admin API works as well,
and actually that is the officially recommended way, as the 4-letter-words
are / will be deprecated some time.
For the admin server see:
https://zookeeper.apache.org/doc/r3.5.8/zookeeperAdmin.html#sc_adminserver

Kind regards,
Mate

On Tue, Jul 14, 2020 at 10:52 PM Sankalp Bhatia 
wrote:

> +users
>
> On Tue, 14 Jul 2020 at 21:51, Sankalp Bhatia 
> wrote:
>
> > Hi All,
> >
> > I am trying to follow the section "Upgrading existing non-TLS cluster
> with
> > no downtime" in the zookeeper guide :
> > https://zookeeper.apache.org/doc/r3.3.2/zookeeperAdmin.html
> >
> > I have an ensemble of 3 servers. I have a couple of questions:
> >
> > 1) When I set sslQuorum=true  and portUnification=true on the first
> > server, does it go out of the quorum? And when these properties are set
> > in the second server, a new quorum of first and second server is formed
> and
> > now the third server is out of quorum. When the 3rd server follows suit,
> it
> > is added back to the quorum.
> >
> > If this is the case, what is the use of a the port-unification feature
> > here?
> >
> > 2) The guideline says to check after restarting every broker that the
> > quorum is healthy, is there any metric to track that?
> >
> > Thanks,
> > Sankalp
> >
> >
> >
> >
>


Re: Zookeeper session expiration

2020-07-20 Thread shrikant kalani
Currently our production is running with 3.5.5 and it will take time to
move to 3.6.1.

When I dig more into this it seems to be related to Netty protocol and it’s
limitation. The system is stable when I fail back to NIO and without SSL.

As soon as I turned on Netty we are seeing sessions getting throttled which
in turn sometimes throttles the ping request too from clients.

I believe we should get an option to configure Netty in such a way that
ping commands are never throttled.

Thanks
Srikant Kalani

On Mon, 20 Jul 2020 at 7:02 PM, Szalay-Bekő Máté 
wrote:

> Hello,
>
> can you reproduce the problem with the latest 3.5 version? I mean 3.5.8.
> There were a few bugfixes recently that can help. e.g.:
> https://issues.apache.org/jira/browse/ZOOKEEPER-3756
> Also you can try to increase some timeout parameters, see
>
> https://zookeeper.apache.org/doc/r3.5.8/zookeeperAdmin.html#sc_configuration
> (like minSessionTimeout, maxSessionTimeout, syncLimit)
>
> Kind regards,
> Mate
>
> On Mon, Jul 13, 2020 at 5:19 PM Srikant Kalani 
> wrote:
>
> > I am facing a similar issue in my application.
> >
> > Zookeeper Server Version 3.5.5
> >
> > I implemented SSL ( server to server ) in quorum communication.
> >
> > After that ZK client frequently receives session timeouts.
> >
> > When I turned off SSL then application is behaving normally and there are
> > no
> > timeouts.
> >
> > Any thoughts ?
> >
> > Thanks
> > Srikant Kalani
> >
> >
> >
> > --
> > Sent from: http://zookeeper-user.578899.n2.nabble.com/
> >
>


Re: Zookeeper session expiration

2020-07-20 Thread Szalay-Bekő Máté
Hello,

can you reproduce the problem with the latest 3.5 version? I mean 3.5.8.
There were a few bugfixes recently that can help. e.g.:
https://issues.apache.org/jira/browse/ZOOKEEPER-3756
Also you can try to increase some timeout parameters, see
https://zookeeper.apache.org/doc/r3.5.8/zookeeperAdmin.html#sc_configuration
(like minSessionTimeout, maxSessionTimeout, syncLimit)

Kind regards,
Mate

On Mon, Jul 13, 2020 at 5:19 PM Srikant Kalani 
wrote:

> I am facing a similar issue in my application.
>
> Zookeeper Server Version 3.5.5
>
> I implemented SSL ( server to server ) in quorum communication.
>
> After that ZK client frequently receives session timeouts.
>
> When I turned off SSL then application is behaving normally and there are
> no
> timeouts.
>
> Any thoughts ?
>
> Thanks
> Srikant Kalani
>
>
>
> --
> Sent from: http://zookeeper-user.578899.n2.nabble.com/
>