these exceptions can mean many things... I think this can be even normal
duding rolling restart (as some connections get broken in this case)

However, I saw cases already when exceptions like these killed receiver or
sender threads in QuorumCnxManager / Leader Election in such a way that
they were not able to recover, so the node was unable to connect to any
quorum until restart. I remember seeing this in 3.4 too.

Do you see these exceptions in the second server (the one which you just
upgraded in step 3)?
Is this issue reproducible?

What is the tickTime and initLimit you use? Maybe the server just require
more time to sync?

I would need more logs to really see what happened. Can you create a Jira
issue and upload the logs and also the ZooKeeper configs? I am happy to
take a closer look.
(if you need to re-run the test to collect the logs, then enabling DEBUG
logs would be great. The INFO level logs are usually enough for these
problems, but one can never know...)

Kind regards,
Mate


On Fri, Apr 3, 2020 at 10:05 AM kuldeep singh <kuldeep.sing...@gmail.com>
wrote:

> Hi Team,
>
> I have done some POC on rolling upgrade and found below result.
>
>
>    1. On 1st node upgrade zookeeper . Traffic was running fine because 2
>    nodes are already on old zookeeper.
>    2. On 1st node upgrade our application and didn’t find any issue
>    3. On 2nd node upgrade zookeeper but got below error and zookeeper is
>    not taking any requests
>    4.
>
> java.io.EOFException
>
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>
>         at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:747)
>
> 2020-03-30 14:19:55,587 - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@765] - Interrupting SendWorker
>
> 2020-03-30 14:19:55,588 - ERROR [LearnerHandler-/192.168.44.73:33754
> :LearnerHandler@562] - Unexpected exception causing shutdown while sock
> still open
>
> java.io.EOFException
>
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>
>         at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>
>         at
>
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>
>         at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>
>         at
>
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:476)
>
> 2020-03-30 14:19:55,588 - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@679] - Interrupted while waiting
> for message on queue
>
> Please let me know is this the known issue or this is different issue which
> is mention in Apache zookeeper documentation when upgrading from 3.4.5 to
> 3.5.6
>
> Thanks,
> ---------------------
> Kuldeep Singh Budania
> Software Architect
>
>
>
> On Sun, Mar 29, 2020 at 9:06 AM Alexander Shraer <shra...@gmail.com>
> wrote:
>
> > +1 to what Mate said (I wrote the quoted instructions).
> >
> >
> >
> > On Tue, Mar 24, 2020 at 7:03 AM Szalay-Bekő Máté <
> > szalay.beko.m...@gmail.com>
> > wrote:
> >
> > > Hi Kuldeep,
> > >
> > > I just want to provide you some background info about our
> documentation.
> > > The reason to upgrade to 3.4.6 first is to avoid the following error:
> > >
> > > > 2013-01-30 11:32:10,663 [myid:2] - WARN [localhost/127.0.0.1:2784
> > > :QuorumCnxManager@349] - Invalid server id: -65536
> > >
> > > This error comes because of the protocol changes between ZooKeeper
> server
> > > nodes during connection initiation for leader election. In ZooKeeper
> 3.5
> > a
> > > protocol version was introduced (see ZOOKEEPER-107) and since that time
> > the
> > > fist long value sent in the initial message is not the server ID but
> the
> > > protocol version (-65536). In ZooKeeper 3.4.6 we made the old 3.4
> > > ZooKeepers backward compatible, so they are able to parse both the old
> > and
> > > the new protocol format (see ZOOKEEPER-1633). This issue happens only
> > when
> > > you need to use old (3.4.0 - 3.4.5) and new (3.5.0+) ZooKeeper servers
> > > together in the same cluster. During a rolling upgrade, this is usually
> > the
> > > case to have old and new ZooKeepers present together.
> > >
> > > The fact that you haven't seen any issues might be caused by the order
> of
> > > the servers. In ZooKeeper the connection initiation between the servers
> > > during the leader election follows a specific rule. As far as I
> remember
> > > always the server with the larger ID 'wins the challenge', so it is
> > > possible, that the old server didn't need to parse any initial message
> > (if
> > > it had the largest ID) and this is why you haven't seen the issue. Also
> > > having 2 nodes up from the 3 nodes cluster still makes the cluster work
> > (so
> > > you should also check if all the servers are part of the quorum).
> > >
> > > I agree with Enrico and Norbert, the safest and most stable way is
> > upgrade
> > > first to 3.4.latest, then go to 3.5.latest. Still, if you don't see
> that
> > > you would hit this specific issue (e.g. no "Invalid server id" in the
> log
> > > files), and all the three servers can handle traffic, then maybe you
> > don't
> > > need to upgrade first to 3.4.latest, it is your decision. Definitely
> you
> > > should test it first, as suggested by the others.
> > >
> > > Kind regards,
> > > Mate
> > >
> > > On Tue, Mar 24, 2020 at 12:29 PM Norbert Kalmar
> > > <nkal...@cloudera.com.invalid> wrote:
> > >
> > > > Hi,
> > > >
> > > > That guide is to upgrade to 3.5.0, which was an alpha version. A lot
> > has
> > > > changed for the first stable release of 3.5.5 and then a few more,
> even
> > > > rolling upgrade issues have been fixed for 3.5.6.
> > > > This is a more up-to-date guide:
> > > > https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ
> > > >
> > > > If you have done your testing (with prod snapshot!), then you can
> skip
> > > 3.4
> > > > latest upgrade, but keep in mind we do our recommendations for a
> > reason.
> > > > There were issues reported and/or found during testing. Some are
> fixed
> > > with
> > > > 3.5.6, some only happens if certain conditions stand (IOException: No
> > > > snapshot found - mentioned in the guide, fixed in 3.5.6).
> > > >
> > > > So it is up to you, I would still recommend to do an 3.4 upgrade
> first,
> > > if
> > > > it's feasible.
> > > >
> > > > Regards,
> > > > Norbert
> > > >
> > > > On Tue, Mar 24, 2020 at 11:45 AM kuldeep singh <
> > > kuldeep.sing...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Current Zookeeper version :- 3.4.5
> > > > > Upgraded version                :- 3.5.6
> > > > >
> > > > > We are not going with 3.5.7. Our final decision is zookeeper
> version
> > is
> > > > > 3.5.6
> > > > > as per your reply first we need to move latest version of 3.4.x,
> like
> > > > below
> > > > >
> > > > > 3.4.5 -> 3.4.14 -> 3.5.6 (Correct me if I am wrong here)
> > > > >
> > > > > But if We are not facing any problem that i have shared you that we
> > > have
> > > > > set up of 3 node cluster where 2 node are on 3.5.6 version and 1
> node
> > > on
> > > > > 3.4.5, Everything is running fine and didn't get any issue, So what
> > > other
> > > > > problem we can face if we directly move to 3.5.6
> > > > >
> > > > > Thanks,
> > > > > ---------------------
> > > > > Kuldeep Singh Budania
> > > > > Software Architect
> > > > >
> > > > >
> > > > > On Tue, Mar 24, 2020 at 3:58 PM Enrico Olivelli <
> eolive...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi
> > > > > > You have to upgrade to latest 3.4.x Zookeeper then you will
> upgrade
> > > to
> > > > > > 3.5.7.
> > > > > > All should run well without issues
> > > > > >
> > > > > >
> > > > > > Enrico
> > > > > >
> > > > > > Il Mar 24 Mar 2020, 10:18 kuldeep singh <
> kuldeep.sing...@gmail.com
> > >
> > > ha
> > > > > > scritto:
> > > > > >
> > > > > > > Hi Team,
> > > > > > >
> > > > > > > We are upgrading zookeeper from 3.4.5 to 3.5.6. I have set up 3
> > > node
> > > > > > > cluster where 2 node are on 3.5.6 version and 1 node on 3.4.5.
> > > > > > >
> > > > > > > Everything is running fine and didn't get any issue on my
> system.
> > > > > > >
> > > > > > > but I found something on apache site  that first we need to
> > upgrade
> > > > on
> > > > > > > 3.4.6 than we can upgrade to 3.5.6. So is it mandatory  to go
> on
> > > > 3.4.6
> > > > > > > first.
> > > > > > >
> > > > > > > *Upgrading to 3.5.0*
> > > > > > >
> > > > > > > Upgrading a running ZooKeeper ensemble to 3.5.0 should be done
> > only
> > > > > after
> > > > > > > upgrading your ensemble to the 3.4.6 release. Note that this is
> > > only
> > > > > > > necessary for rolling upgrades (if you're fine with shutting
> down
> > > the
> > > > > > > system completely, you don't have to go through 3.4.6). If you
> > > > attempt
> > > > > a
> > > > > > > rolling upgrade without going through 3.4.6 (for example from
> > > 3.4.5),
> > > > > you
> > > > > > > may get the following error:
> > > > > > >
> > > > > > > 2013-01-30 11:32:10,663 [myid:2] - INFO [localhost/
> > 127.0.0.1:2784
> > > > > > > :QuorumCnxManager$Listener@498] - Received connection request
> /
> > > > > > > 127.0.0.1:60876
> > > > > > >
> > > > > > > 2013-01-30 11:32:10,663 [myid:2] - WARN [localhost/
> > 127.0.0.1:2784
> > > > > > > :QuorumCnxManager@349] - Invalid server id: -65536
> > > > > > >
> > > > > > > During a rolling upgrade, each server is taken down in turn and
> > > > > rebooted
> > > > > > > with the new 3.5.0 binaries. Before starting the server with
> > 3.5.0
> > > > > > > binaries, we highly recommend updating the configuration file
> so
> > > that
> > > > > all
> > > > > > > server statements "server.x=..." contain client ports (see the
> > > > section
> > > > > > > Specifying
> > > > > > > the client port). As explained earlier you may leave the
> > > > configuration
> > > > > > in a
> > > > > > > single file, as well as leave the clientPort/clientPortAddress
> > > > > statements
> > > > > > > (although if you specify client ports in the new format, these
> > > > > statements
> > > > > > > are now redundant).
> > > > > > >
> > > > > > > Could you please let me know about this case. Appreciate if
> > respond
> > > > > soon.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > ---------------------
> > > > > > > Kuldeep Singh Budania
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to