Hi Team,
I have done some POC on rolling upgrade and found below result.
1. On 1st node upgrade zookeeper . Traffic was running fine because 2
nodes are already on old zookeeper.
2. On 1st node upgrade our application and didn’t find any issue
3. On 2nd node upgrade zookeeper but got below error and zookeeper is
not taking any requests
4.
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:747)
2020-03-30 14:19:55,587 - WARN
[RecvWorker:1:QuorumCnxManager$RecvWorker@765] - Interrupting SendWorker
2020-03-30 14:19:55,588 - ERROR [LearnerHandler-/192.168.44.73:33754
:LearnerHandler@562] - Unexpected exception causing shutdown while sock
still open
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
at
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:476)
2020-03-30 14:19:55,588 - WARN
[SendWorker:1:QuorumCnxManager$SendWorker@679] - Interrupted while waiting
for message on queue
Please let me know is this the known issue or this is different issue which
is mention in Apache zookeeper documentation when upgrading from 3.4.5 to
3.5.6
Thanks,
-
Kuldeep Singh Budania
Software Architect
On Sun, Mar 29, 2020 at 9:06 AM Alexander Shraer wrote:
> +1 to what Mate said (I wrote the quoted instructions).
>
>
>
> On Tue, Mar 24, 2020 at 7:03 AM Szalay-Bekő Máté <
> szalay.beko.m...@gmail.com>
> wrote:
>
> > Hi Kuldeep,
> >
> > I just want to provide you some background info about our documentation.
> > The reason to upgrade to 3.4.6 first is to avoid the following error:
> >
> > > 2013-01-30 11:32:10,663 [myid:2] - WARN [localhost/127.0.0.1:2784
> > :QuorumCnxManager@349] - Invalid server id: -65536
> >
> > This error comes because of the protocol changes between ZooKeeper server
> > nodes during connection initiation for leader election. In ZooKeeper 3.5
> a
> > protocol version was introduced (see ZOOKEEPER-107) and since that time
> the
> > fist long value sent in the initial message is not the server ID but the
> > protocol version (-65536). In ZooKeeper 3.4.6 we made the old 3.4
> > ZooKeepers backward compatible, so they are able to parse both the old
> and
> > the new protocol format (see ZOOKEEPER-1633). This issue happens only
> when
> > you need to use old (3.4.0 - 3.4.5) and new (3.5.0+) ZooKeeper servers
> > together in the same cluster. During a rolling upgrade, this is usually
> the
> > case to have old and new ZooKeepers present together.
> >
> > The fact that you haven't seen any issues might be caused by the order of
> > the servers. In ZooKeeper the connection initiation between the servers
> > during the leader election follows a specific rule. As far as I remember
> > always the server with the larger ID 'wins the challenge', so it is
> > possible, that the old server didn't need to parse any initial message
> (if
> > it had the largest ID) and this is why you haven't seen the issue. Also
> > having 2 nodes up from the 3 nodes cluster still makes the cluster work
> (so
> > you should also check if all the servers are part of the quorum).
> >
> > I agree with Enrico and Norbert, the safest and most stable way is
> upgrade
> > first to 3.4.latest, then go to 3.5.latest. Still, if you don't see that
> > you would hit this specific issue (e.g. no "Invalid server id" in the log
> > files), and all the three servers can handle traffic, then maybe you
> don't
> > need to upgrade first to 3.4.latest, it is your decision. Definitely you
> > should test it first, as suggested by the others.
> >
> > Kind regards,
> > Mate
> >
> > On Tue, Mar 24, 2020 at 12:29 PM Norbert Kalmar
> > wrote:
> >
> > > Hi,
> > >
> > > That guide is to upgrade to 3.5.0, which was an alpha version. A lot
> has
> > > changed for the first stable release of 3.5.5 and then a few more, even
> > > rolling upgrade issues have been fixed for 3.5.6.
> > > This is a more up-to-date guide:
> > > https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ
> > >
> > > If you have done your testing (with prod snapshot!), then you can skip
> > 3.4
> > > latest upgrade, but keep in mind we do our recommendations for a
> reason.
> > > There were issues reported and/or found during testing. Some are fixed
> > with
> > > 3.5.6, some only happens if certain conditions stand (IOException: No
> > > snapshot found - mentioned in the guide, fixed in 3.5.6).
> > >
> > > So it is up to you, I would still recommend to do an 3.4 upgrade first,
> > if
> > > it's feasible.
> > >
> > > Regards,
> > > Norbert
> > >
> > > On Tue, Mar 24,