This is really useful discussion, I really appreciate it! I'm not too worried about the restarts that I saw and they are totally unrelated to the upgrade. The upgrade is only relevant insofar as I was seeking confidence that I would not see the issue once upgraded to 3.5.5 but I'm inclined to believe the restarts were due to Exhibitor.
Whether or not I can create a mixed version ensemble is a far more important question to me since I'm currently trying to devise an upgrade strategy that avoids taking downtime. Thanks, Jerry On Thu, Oct 3, 2019 at 6:59 AM Enrico Olivelli <eolive...@gmail.com> wrote: > I think it is possible to perform a rolling upgrade from 3.4, all of my > customers migrated one year ago and without any issue (reported to my > team). > > Norbert, where did you find that information? > > btw I would like to setup tests about backward compatibility, > server-to-server and client-to-server > > Enrico > > Il giorno gio 3 ott 2019 alle ore 15:16 Jörn Franke <jornfra...@gmail.com> > ha scritto: > > > I tried only from 3.4.14 and there it was possible. I recommend first to > > upgrade to the latest 3.4 version and then to 3.5 > > > > > Am 02.10.2019 um 21:40 schrieb Jerry Hebert <jerry.heb...@gmail.com>: > > > > > > Hi Jörn, > > > > > > No, this was a very intermittent issue. We've been running this > ensemble > > > for about four years now and have never seen this problem so it seems > to > > be > > > super heisenbuggy. Our upgrade process will be more involved than what > > you > > > described (we're switching networks, instance types, underlying > > automation > > > and removing Exhibitor) but I'm glad you asked because I have a > question > > > about that too. :) > > > > > > Are you saying that a 3.5.5 node can synchronize with a 3.4.11 > ensemble? > > I > > > wasn't sure if that would work or not. e.g., maybe I could bring up the > > new > > > 3.5.5 ensemble and temporarily form a 10-node ensemble (five 3.4.11 > > nodes, > > > five 3.5.5 nodes), let them sync and then kill off the old 3.4.11 > boxes? > > > > > > Thanks, > > > Jerry > > > > > >> On Wed, Oct 2, 2019 at 12:29 PM Jörn Franke <jornfra...@gmail.com> > > wrote: > > >> > > >> Have you tried to stop the node, delete the data and log directory, > > >> upgrade to 3.5.5 , start the node and wait until it is synchronized ? > > >> > > >>>> Am 02.10.2019 um 20:14 schrieb Jerry Hebert <jerry.heb...@gmail.com > >: > > >>> > > >>> Hi all, > > >>> > > >>> My first post here! I'm hoping you all might be able to offer some > > >> guidance > > >>> or redirect me to an existing ticket. We have a five node ensemble on > > >>> 3.4.11 that we're currently in the process of upgrading to 3.5.5. We > > >>> recently saw some bizarre behavior in our ensemble that I was hoping > to > > >>> find some sort pre-existing ticket or discussion about but I was > having > > >>> difficulty finding hits for this in Jira. > > >>> > > >>> The behavior that we saw from our metrics is that one of our nodes > (not > > >>> sure if it was a follower or a leader) started to demonstrate > > >>> instability (high CPU, high RAM) and it crashed. Not a big deal, but > as > > >>> soon as it crashed, all of the other four nodes all immediately > > >> restarted, > > >>> resulting in a short outage. One node crashing should never cause an > > >>> ensemble restart of course, so I assumed that this must be a bug in > ZK. > > >> The > > >>> nodes that restarted had no indication of errors in their logs, they > > just > > >>> simply restarted. Does this sound familiar to any of you? > > >>> > > >>> Also, we are using Exhibitor on that ensemble so it's also possible > > that > > >>> the restart was caused by Exhibitor. > > >>> > > >>> My hope is that this issue will be behind us once the 3.5.5 upgrade > is > > >>> complete but I'd ideally like to find some concrete evidence of this. > > >>> > > >>> Thanks! > > >>> Jerry > > >> > > >