Hello, do you mean ZOOKEEPER-1810 patch? That one alone doesn't solve the problem. On the other hand, the problem doesn't happen always, so after a rolling start it might get solved. We need 1818 as well, but it is easier to go step by step and get 1810 in trunk first. I hope that as soon as 3.4.6 is out this might get some attention.
Regards, German. On Wed, Mar 5, 2014 at 2:17 AM, Deepak Jagtap <[email protected]>wrote: > Hi, > > Please ignore the previous comment, I used wrong jar file and hence rolling > upgrade failed. > After applying patch for bug on zookeeper-3.5.0.1562289 > revision, rolling upgrade went fine. > > I have patched in house zookeeper version, but it would be convenient if we > apply patch on trunk and use the latest trunk. > Please advise if I can apply the patch on the trunk and test it for you. > > Thanks & Regards, > Deepak > > > On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap <[email protected] > >wrote: > > > Hi German, > > > > I tried applying patch for 1805 but problem still persists. > > Following are the notification messages logged repeatedly by the node > > which fails to join the quorum: > > > > > > 2014-03-04 20:00:54,398 [myid:2] - INFO > > [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] - > > Notification time out: 51200 > > 2014-03-04 20:00:54,400 [myid:2] - INFO > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 2 > > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), > 0x0 > > (n.peerEPoch), LOOKING (my state)1 (n.config version) > > 2014-03-04 20:00:54,401 [myid:2] - INFO > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3 > > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING (n.state), 1 > > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config version) > > 2014-03-04 20:00:54,403 [myid:2] - INFO > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3 > > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round), LEADING > > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1 (n.config > > version) > > > > > > > > Patch for 1732 is already included in the trunk. > > > > > > Thanks & Regards, > > Deepak > > > > > > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap <[email protected] > >wrote: > > > >> Hi Flavio, German, > >> > >> Since this fix is critical for zookeeper rolling upgrade is it ok if I > >> apply this patch to 3.5.0 trunk? > >> Is it straightforward to apply this patch to trunk? > >> > >> Thanks & Regards, > >> Deepak > >> > >> > >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap < > [email protected]>wrote: > >> > >>> Thanks German! > >>> Just wondering is there any chance that this patch may be applied to > >>> trunk in near future? > >>> If it's fine with you guys, I would be more than happy to apply the > >>> fixes (from 3.4.5) to trunk and test them. > >>> > >>> Thanks & Regards, > >>> Deepak > >>> > >>> > >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco < > >>> [email protected]> wrote: > >>> > >>>> Hello Deepak, > >>>> > >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some cases in > >>>> which an ensemble can be formed so that it doesn't allow any other > >>>> zookeeper server to join. > >>>> This has been fixed in branch 3.4, but it hasn't been fixed in trunk > >>>> yet. > >>>> Check if the Notifications sent around contain different values for > the > >>>> vote in the members of the ensemble. > >>>> If you force a new election (e.g. by killing the leader) I guess > >>>> everything > >>>> should work normally, but don't take my word for it. > >>>> Flavio should know more about this. > >>>> > >>>> Cheers, > >>>> > >>>> German. > >>>> > >>>> > >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap < > [email protected] > >>>> >wrote: > >>>> > >>>> > Hi, > >>>> > > >>>> > I replacing one of the zookeeper server from 3 node quorum. > >>>> > Initially all zookeeper serves were running 3.5.0.1515976 version. > >>>> > I successfully replaced Node3 with newer version 3.5.0.1551730. > >>>> > When I am trying to replace Node2 with the same zookeeper version. > >>>> > I couldn't start zookeeper server on Node2 as it is continuously > >>>> stuck in > >>>> > leader election loop printing following messages: > >>>> > > >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO > >>>> > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] - > >>>> > Notification time out: 60000 > >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO > >>>> > [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller server > >>>> > identifier, so dropping the connection: (5, 3) > >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO > >>>> > [WorkerReceiver[myid=3]:FastLeaderElection@605] - Notification: 3 > >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 > >>>> (n.sid), 0x0 > >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version) > >>>> > > >>>> > > >>>> > Network connections and configuration of the node being upgraded are > >>>> fine. > >>>> > The other 2 nodes in the quorum are fine and serving the request. > >>>> > > >>>> > Any idea what might be causing this? > >>>> > > >>>> > Thanks & Regards, > >>>> > Deepak > >>>> > > >>>> > >>> > >>> > >> > > >
