Hi, Here are the issues we encountered so far upgrading to 3.5.5 from 3.4: https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ
As Enrico mentioned, nothing similar so far. One is no snapshot taken yet the other is 4 letter words needs to be whitelisted. As for running a mixed version of 3.5 and 3.4 quorum - I'm afraid it will not work. From 3.5 we have a check on PROTOCOL_VERSION. 3.4 did not have this protocol version, so when the nodes try to communicate it will throw an exception. Plus, it is not a goal to keep quorum protocol backward compatible, so chances are even without the check it would not work. Regards, Norbert On Thu, Oct 3, 2019 at 12:09 AM Enrico Olivelli <eolive...@gmail.com> wrote: > Il mer 2 ott 2019, 22:52 Jerry Hebert <jerry.heb...@gmail.com> ha scritto: > > > Hi Enrico, > > > > The nodes that restarted did not have any errors in their logs, they > seemed > > to simply restart successfully so I think your hunch about the external > > system is probably correct. > > > > Could you comment on my second question above regarding cross-version > > migration or should I make a new thread? > > > > > I am not aware of any issue about an upgrade from 3.4 to 3.5 similar to > your case. It is expected to work. > > Enrico > > > > Are you saying that a 3.5.5 node can synchronize with a 3.4.11 ensemble? > I > > > wasn't sure if that would work or not. e.g., maybe I could bring up the > > new > > > 3.5.5 ensemble and temporarily form a 10-node ensemble (five 3.4.11 > > nodes, > > > five 3.5.5 nodes), let them sync and then kill off the old 3.4.11 > boxes? > > > > > > Thanks! > > Jerry > > > > On Wed, Oct 2, 2019 at 1:12 PM Enrico Olivelli <eolive...@gmail.com> > > wrote: > > > > > Any particular error/stacktrace in the logs? > > > If it is zookeeper that is self killing it should log it, otherwise is > > some > > > other external system, I am sorry I don't know Exhibitor > > > > > > Hope that helps > > > Enrico > > > > > > Il mer 2 ott 2019, 21:40 Jerry Hebert <jerry.heb...@gmail.com> ha > > scritto: > > > > > > > Hi Jörn, > > > > > > > > No, this was a very intermittent issue. We've been running this > > ensemble > > > > for about four years now and have never seen this problem so it seems > > to > > > be > > > > super heisenbuggy. Our upgrade process will be more involved than > what > > > you > > > > described (we're switching networks, instance types, underlying > > > automation > > > > and removing Exhibitor) but I'm glad you asked because I have a > > question > > > > about that too. :) > > > > > > > > Are you saying that a 3.5.5 node can synchronize with a 3.4.11 > > ensemble? > > > I > > > > wasn't sure if that would work or not. e.g., maybe I could bring up > the > > > new > > > > 3.5.5 ensemble and temporarily form a 10-node ensemble (five 3.4.11 > > > nodes, > > > > five 3.5.5 nodes), let them sync and then kill off the old 3.4.11 > > boxes? > > > > > > > > Thanks, > > > > Jerry > > > > > > > > On Wed, Oct 2, 2019 at 12:29 PM Jörn Franke <jornfra...@gmail.com> > > > wrote: > > > > > > > > > Have you tried to stop the node, delete the data and log directory, > > > > > upgrade to 3.5.5 , start the node and wait until it is > synchronized ? > > > > > > > > > > > Am 02.10.2019 um 20:14 schrieb Jerry Hebert < > > jerry.heb...@gmail.com > > > >: > > > > > > > > > > > > Hi all, > > > > > > > > > > > > My first post here! I'm hoping you all might be able to offer > some > > > > > guidance > > > > > > or redirect me to an existing ticket. We have a five node > ensemble > > on > > > > > > 3.4.11 that we're currently in the process of upgrading to 3.5.5. > > We > > > > > > recently saw some bizarre behavior in our ensemble that I was > > hoping > > > to > > > > > > find some sort pre-existing ticket or discussion about but I was > > > having > > > > > > difficulty finding hits for this in Jira. > > > > > > > > > > > > The behavior that we saw from our metrics is that one of our > nodes > > > (not > > > > > > sure if it was a follower or a leader) started to demonstrate > > > > > > instability (high CPU, high RAM) and it crashed. Not a big deal, > > but > > > as > > > > > > soon as it crashed, all of the other four nodes all immediately > > > > > restarted, > > > > > > resulting in a short outage. One node crashing should never cause > > an > > > > > > ensemble restart of course, so I assumed that this must be a bug > in > > > ZK. > > > > > The > > > > > > nodes that restarted had no indication of errors in their logs, > > they > > > > just > > > > > > simply restarted. Does this sound familiar to any of you? > > > > > > > > > > > > Also, we are using Exhibitor on that ensemble so it's also > possible > > > > that > > > > > > the restart was caused by Exhibitor. > > > > > > > > > > > > My hope is that this issue will be behind us once the 3.5.5 > upgrade > > > is > > > > > > complete but I'd ideally like to find some concrete evidence of > > this. > > > > > > > > > > > > Thanks! > > > > > > Jerry > > > > > > > > > > > > > > >