StandaloneDisabledTest.startSingleServerTest seems to be failing from the same issue. We should fix this soon.
https://issues.apache.org/jira/browse/ZOOKEEPER-1870 On Mon, Mar 10, 2014 at 5:33 PM, Deepak Jagtap <[email protected]> wrote: > Hello, > > Another query regarding 1805. > I am observing zookeeper rolling upgrade is always succeeds when I apply > 1805 patch. > When I apply both 1810 and 1805 patch rolling upgrade fails due to an > issue mentioned earlier. > > Please advise, if it's fine to use only patch 1805 for the trunk? > > Thanks & Regards, > Deepak > > > On Mon, Mar 10, 2014 at 3:11 PM, Deepak Jagtap <[email protected]>wrote: > >> Hi German, >> >> I have applied patch 1810 and 1805 against trunk revision 1574686 (recent >> revision against which 1810 patch build succeeded). >> But observing following error in the zookeeper log on the new node joining >> quorum: >> >> 2014-03-10 21:11:25,126 [myid:1] - INFO >> [WorkerSender[myid=1]:QuorumCnxManager@195] - Have smaller server >> identifier, so dropping the connection: (3, 1) >> 2014-03-10 21:11:25,127 [myid:1] - INFO [/169.254.44.1:3888 >> :QuorumCnxManager$Listener@540] - Received connection request / >> 169.254.44.3:51507 >> 2014-03-10 21:11:25,193 [myid:1] - ERROR >> [WorkerReceiver[myid=1]:NIOServerCnxnFactory$1@92] - Thread >> Thread[WorkerReceiver[myid=1],5,main] died >> java.lang.OutOfMemoryError: Java heap space >> at >> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerReceiver.run(FastLeaderElection.java:273) >> at java.lang.Thread.run(Unknown Source) >> >> Followed by these messages getting printed repeatedly: >> 2014-03-10 21:11:25,328 [myid:1] - INFO >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - >> Notification time out: 400 >> 2014-03-10 21:11:25,729 [myid:1] - INFO >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - >> Notification time out: 800 >> 2014-03-10 21:11:26,530 [myid:1] - INFO >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - >> Notification time out: 1600 >> 2014-03-10 21:11:28,131 [myid:1] - INFO >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - >> Notification time out: 3200 >> 2014-03-10 21:11:31,332 [myid:1] - INFO >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - >> Notification time out: 6400 >> >> Thanks & Reagrds, >> Deepak >> >> >> >> >> >> On Wed, Mar 5, 2014 at 11:50 AM, Deepak Jagtap >> <[email protected]>wrote: >> >>> Hi, >>> >>> I have applied only 1805 patch, not 1810. >>> And upgrade is from 3.5.0.1458648 to 3.5.0.1562289 (not from 3.4.5). >>> It was failing very consistently in our environment, and after 1805 patch >>> it went smoothly. >>> >>> Regards, >>> Deepak >>> >>> >>> >>> >>> >>> On Wed, Mar 5, 2014 at 7:36 AM, German Blanco < >>> [email protected]> wrote: >>> >>>> Hello, >>>> >>>> do you mean ZOOKEEPER-1810 patch? >>>> That one alone doesn't solve the problem. On the other hand, the problem >>>> doesn't happen always, so after a rolling start it might get solved. >>>> We need 1818 as well, but it is easier to go step by step and get 1810 in >>>> trunk first. >>>> I hope that as soon as 3.4.6 is out this might get some attention. >>>> >>>> Regards, >>>> >>>> German. >>>> >>>> >>>> On Wed, Mar 5, 2014 at 2:17 AM, Deepak Jagtap <[email protected] >>>> >wrote: >>>> >>>> > Hi, >>>> > >>>> > Please ignore the previous comment, I used wrong jar file and hence >>>> rolling >>>> > upgrade failed. >>>> > After applying patch for bug on zookeeper-3.5.0.1562289 >>>> > revision, rolling upgrade went fine. >>>> > >>>> > I have patched in house zookeeper version, but it would be convenient >>>> if we >>>> > apply patch on trunk and use the latest trunk. >>>> > Please advise if I can apply the patch on the trunk and test it for >>>> you. >>>> > >>>> > Thanks & Regards, >>>> > Deepak >>>> > >>>> > >>>> > On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap < >>>> [email protected] >>>> > >wrote: >>>> > >>>> > > Hi German, >>>> > > >>>> > > I tried applying patch for 1805 but problem still persists. >>>> > > Following are the notification messages logged repeatedly by the node >>>> > > which fails to join the quorum: >>>> > > >>>> > > >>>> > > 2014-03-04 20:00:54,398 [myid:2] - INFO >>>> > > [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] - >>>> > > Notification time out: 51200 >>>> > > 2014-03-04 20:00:54,400 [myid:2] - INFO >>>> > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 2 >>>> > > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 >>>> (n.sid), >>>> > 0x0 >>>> > > (n.peerEPoch), LOOKING (my state)1 (n.config version) >>>> > > 2014-03-04 20:00:54,401 [myid:2] - INFO >>>> > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3 >>>> > > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING >>>> (n.state), 1 >>>> > > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config version) >>>> > > 2014-03-04 20:00:54,403 [myid:2] - INFO >>>> > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3 >>>> > > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round), >>>> LEADING >>>> > > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1 >>>> (n.config >>>> > > version) >>>> > > >>>> > > >>>> > > >>>> > > Patch for 1732 is already included in the trunk. >>>> > > >>>> > > >>>> > > Thanks & Regards, >>>> > > Deepak >>>> > > >>>> > > >>>> > > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap < >>>> [email protected] >>>> > >wrote: >>>> > > >>>> > >> Hi Flavio, German, >>>> > >> >>>> > >> Since this fix is critical for zookeeper rolling upgrade is it ok >>>> if I >>>> > >> apply this patch to 3.5.0 trunk? >>>> > >> Is it straightforward to apply this patch to trunk? >>>> > >> >>>> > >> Thanks & Regards, >>>> > >> Deepak >>>> > >> >>>> > >> >>>> > >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap < >>>> > [email protected]>wrote: >>>> > >> >>>> > >>> Thanks German! >>>> > >>> Just wondering is there any chance that this patch may be applied >>>> to >>>> > >>> trunk in near future? >>>> > >>> If it's fine with you guys, I would be more than happy to apply the >>>> > >>> fixes (from 3.4.5) to trunk and test them. >>>> > >>> >>>> > >>> Thanks & Regards, >>>> > >>> Deepak >>>> > >>> >>>> > >>> >>>> > >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco < >>>> > >>> [email protected]> wrote: >>>> > >>> >>>> > >>>> Hello Deepak, >>>> > >>>> >>>> > >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some >>>> cases in >>>> > >>>> which an ensemble can be formed so that it doesn't allow any other >>>> > >>>> zookeeper server to join. >>>> > >>>> This has been fixed in branch 3.4, but it hasn't been fixed in >>>> trunk >>>> > >>>> yet. >>>> > >>>> Check if the Notifications sent around contain different values >>>> for >>>> > the >>>> > >>>> vote in the members of the ensemble. >>>> > >>>> If you force a new election (e.g. by killing the leader) I guess >>>> > >>>> everything >>>> > >>>> should work normally, but don't take my word for it. >>>> > >>>> Flavio should know more about this. >>>> > >>>> >>>> > >>>> Cheers, >>>> > >>>> >>>> > >>>> German. >>>> > >>>> >>>> > >>>> >>>> > >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap < >>>> > [email protected] >>>> > >>>> >wrote: >>>> > >>>> >>>> > >>>> > Hi, >>>> > >>>> > >>>> > >>>> > I replacing one of the zookeeper server from 3 node quorum. >>>> > >>>> > Initially all zookeeper serves were running 3.5.0.1515976 >>>> version. >>>> > >>>> > I successfully replaced Node3 with newer version 3.5.0.1551730. >>>> > >>>> > When I am trying to replace Node2 with the same zookeeper >>>> version. >>>> > >>>> > I couldn't start zookeeper server on Node2 as it is continuously >>>> > >>>> stuck in >>>> > >>>> > leader election loop printing following messages: >>>> > >>>> > >>>> > >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO >>>> > >>>> > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] >>>> - >>>> > >>>> > Notification time out: 60000 >>>> > >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO >>>> > >>>> > [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller >>>> server >>>> > >>>> > identifier, so dropping the connection: (5, 3) >>>> > >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO >>>> > >>>> > [WorkerReceiver[myid=3]:FastLeaderElection@605] - >>>> Notification: 3 >>>> > >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 >>>> > >>>> (n.sid), 0x0 >>>> > >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version) >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > Network connections and configuration of the node being >>>> upgraded are >>>> > >>>> fine. >>>> > >>>> > The other 2 nodes in the quorum are fine and serving the >>>> request. >>>> > >>>> > >>>> > >>>> > Any idea what might be causing this? >>>> > >>>> > >>>> > >>>> > Thanks & Regards, >>>> > >>>> > Deepak >>>> > >>>> > >>>> > >>>> >>>> > >>> >>>> > >>> >>>> > >> >>>> > > >>>> > >>>> >>> >>> >>
