Hi German, I tried applying patch for 1805 but problem still persists. Following are the notification messages logged repeatedly by the node which fails to join the quorum:
2014-03-04 20:00:54,398 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] - Notification time out: 51200 2014-03-04 20:00:54,400 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)1 (n.config version) 2014-03-04 20:00:54,401 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3 (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING (n.state), 1 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config version) 2014-03-04 20:00:54,403 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3 (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round), LEADING (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1 (n.config version) Patch for 1732 is already included in the trunk. Thanks & Regards, Deepak On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap <[email protected]>wrote: > Hi Flavio, German, > > Since this fix is critical for zookeeper rolling upgrade is it ok if I > apply this patch to 3.5.0 trunk? > Is it straightforward to apply this patch to trunk? > > Thanks & Regards, > Deepak > > > On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap > <[email protected]>wrote: > >> Thanks German! >> Just wondering is there any chance that this patch may be applied to >> trunk in near future? >> If it's fine with you guys, I would be more than happy to apply the fixes >> (from 3.4.5) to trunk and test them. >> >> Thanks & Regards, >> Deepak >> >> >> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco < >> [email protected]> wrote: >> >>> Hello Deepak, >>> >>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some cases in >>> which an ensemble can be formed so that it doesn't allow any other >>> zookeeper server to join. >>> This has been fixed in branch 3.4, but it hasn't been fixed in trunk yet. >>> Check if the Notifications sent around contain different values for the >>> vote in the members of the ensemble. >>> If you force a new election (e.g. by killing the leader) I guess >>> everything >>> should work normally, but don't take my word for it. >>> Flavio should know more about this. >>> >>> Cheers, >>> >>> German. >>> >>> >>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap <[email protected] >>> >wrote: >>> >>> > Hi, >>> > >>> > I replacing one of the zookeeper server from 3 node quorum. >>> > Initially all zookeeper serves were running 3.5.0.1515976 version. >>> > I successfully replaced Node3 with newer version 3.5.0.1551730. >>> > When I am trying to replace Node2 with the same zookeeper version. >>> > I couldn't start zookeeper server on Node2 as it is continuously stuck >>> in >>> > leader election loop printing following messages: >>> > >>> > 2014-02-26 02:45:23,709 [myid:3] - INFO >>> > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] - >>> > Notification time out: 60000 >>> > 2014-02-26 02:45:23,710 [myid:3] - INFO >>> > [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller server >>> > identifier, so dropping the connection: (5, 3) >>> > 2014-02-26 02:45:23,712 [myid:3] - INFO >>> > [WorkerReceiver[myid=3]:FastLeaderElection@605] - Notification: 3 >>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), >>> 0x0 >>> > (n.peerEPoch), LOOKING (my state)1 (n.config version) >>> > >>> > >>> > Network connections and configuration of the node being upgraded are >>> fine. >>> > The other 2 nodes in the quorum are fine and serving the request. >>> > >>> > Any idea what might be causing this? >>> > >>> > Thanks & Regards, >>> > Deepak >>> > >>> >> >> >
