Hello Michi, I observed following while testing patch for 1805 against trunk revision 1574686. I ran " ant -Djavac.args="-Xlint -Xmaxwarns 1000" clean test tar" against trunk revision 1574686. Build failed as StandAloneDisabledTest failed.
After applying 1805 against 1574686 build failed with following test failed: 1. StandAloneDisabledTest 2. QuorumTest When I only run QuorumTest against this (1574686 + 1805 patch) it succeeds. (using "ant -Dtestcase=QuorumTest test") Please advise, if I should assume build is successful except StandAloneDisabled test? Thanks & Regards, Deepak On Mon, Mar 10, 2014 at 6:11 PM, Deepak Jagtap <[email protected]>wrote: > Thanks Michi! > > > On Mon, Mar 10, 2014 at 5:40 PM, Michi Mutsuzaki <[email protected]>wrote: > >> StandaloneDisabledTest.startSingleServerTest seems to be failing from >> the same issue. We should fix this soon. >> >> https://issues.apache.org/jira/browse/ZOOKEEPER-1870 >> >> On Mon, Mar 10, 2014 at 5:33 PM, Deepak Jagtap <[email protected]> >> wrote: >> > Hello, >> > >> > Another query regarding 1805. >> > I am observing zookeeper rolling upgrade is always succeeds when I apply >> > 1805 patch. >> > When I apply both 1810 and 1805 patch rolling upgrade fails due to an >> > issue mentioned earlier. >> > >> > Please advise, if it's fine to use only patch 1805 for the trunk? >> > >> > Thanks & Regards, >> > Deepak >> > >> > >> > On Mon, Mar 10, 2014 at 3:11 PM, Deepak Jagtap <[email protected] >> >wrote: >> > >> >> Hi German, >> >> >> >> I have applied patch 1810 and 1805 against trunk revision 1574686 >> (recent >> >> revision against which 1810 patch build succeeded). >> >> But observing following error in the zookeeper log on the new node >> joining >> >> quorum: >> >> >> >> 2014-03-10 21:11:25,126 [myid:1] - INFO >> >> [WorkerSender[myid=1]:QuorumCnxManager@195] - Have smaller server >> >> identifier, so dropping the connection: (3, 1) >> >> 2014-03-10 21:11:25,127 [myid:1] - INFO [/169.254.44.1:3888 >> >> :QuorumCnxManager$Listener@540] - Received connection request / >> >> 169.254.44.3:51507 >> >> 2014-03-10 21:11:25,193 [myid:1] - ERROR >> >> [WorkerReceiver[myid=1]:NIOServerCnxnFactory$1@92] - Thread >> >> Thread[WorkerReceiver[myid=1],5,main] died >> >> java.lang.OutOfMemoryError: Java heap space >> >> at >> >> >> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerReceiver.run(FastLeaderElection.java:273) >> >> at java.lang.Thread.run(Unknown Source) >> >> >> >> Followed by these messages getting printed repeatedly: >> >> 2014-03-10 21:11:25,328 [myid:1] - INFO >> >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - >> >> Notification time out: 400 >> >> 2014-03-10 21:11:25,729 [myid:1] - INFO >> >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - >> >> Notification time out: 800 >> >> 2014-03-10 21:11:26,530 [myid:1] - INFO >> >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - >> >> Notification time out: 1600 >> >> 2014-03-10 21:11:28,131 [myid:1] - INFO >> >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - >> >> Notification time out: 3200 >> >> 2014-03-10 21:11:31,332 [myid:1] - INFO >> >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - >> >> Notification time out: 6400 >> >> >> >> Thanks & Reagrds, >> >> Deepak >> >> >> >> >> >> >> >> >> >> >> >> On Wed, Mar 5, 2014 at 11:50 AM, Deepak Jagtap < >> [email protected]>wrote: >> >> >> >>> Hi, >> >>> >> >>> I have applied only 1805 patch, not 1810. >> >>> And upgrade is from 3.5.0.1458648 to 3.5.0.1562289 (not from 3.4.5). >> >>> It was failing very consistently in our environment, and after 1805 >> patch >> >>> it went smoothly. >> >>> >> >>> Regards, >> >>> Deepak >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> On Wed, Mar 5, 2014 at 7:36 AM, German Blanco < >> >>> [email protected]> wrote: >> >>> >> >>>> Hello, >> >>>> >> >>>> do you mean ZOOKEEPER-1810 patch? >> >>>> That one alone doesn't solve the problem. On the other hand, the >> problem >> >>>> doesn't happen always, so after a rolling start it might get solved. >> >>>> We need 1818 as well, but it is easier to go step by step and get >> 1810 in >> >>>> trunk first. >> >>>> I hope that as soon as 3.4.6 is out this might get some attention. >> >>>> >> >>>> Regards, >> >>>> >> >>>> German. >> >>>> >> >>>> >> >>>> On Wed, Mar 5, 2014 at 2:17 AM, Deepak Jagtap < >> [email protected] >> >>>> >wrote: >> >>>> >> >>>> > Hi, >> >>>> > >> >>>> > Please ignore the previous comment, I used wrong jar file and hence >> >>>> rolling >> >>>> > upgrade failed. >> >>>> > After applying patch for bug on zookeeper-3.5.0.1562289 >> >>>> > revision, rolling upgrade went fine. >> >>>> > >> >>>> > I have patched in house zookeeper version, but it would be >> convenient >> >>>> if we >> >>>> > apply patch on trunk and use the latest trunk. >> >>>> > Please advise if I can apply the patch on the trunk and test it for >> >>>> you. >> >>>> > >> >>>> > Thanks & Regards, >> >>>> > Deepak >> >>>> > >> >>>> > >> >>>> > On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap < >> >>>> [email protected] >> >>>> > >wrote: >> >>>> > >> >>>> > > Hi German, >> >>>> > > >> >>>> > > I tried applying patch for 1805 but problem still persists. >> >>>> > > Following are the notification messages logged repeatedly by the >> node >> >>>> > > which fails to join the quorum: >> >>>> > > >> >>>> > > >> >>>> > > 2014-03-04 20:00:54,398 [myid:2] - INFO >> >>>> > > [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] >> - >> >>>> > > Notification time out: 51200 >> >>>> > > 2014-03-04 20:00:54,400 [myid:2] - INFO >> >>>> > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - >> Notification: 2 >> >>>> > > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 >> >>>> (n.sid), >> >>>> > 0x0 >> >>>> > > (n.peerEPoch), LOOKING (my state)1 (n.config version) >> >>>> > > 2014-03-04 20:00:54,401 [myid:2] - INFO >> >>>> > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - >> Notification: 3 >> >>>> > > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING >> >>>> (n.state), 1 >> >>>> > > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config >> version) >> >>>> > > 2014-03-04 20:00:54,403 [myid:2] - INFO >> >>>> > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - >> Notification: 3 >> >>>> > > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round), >> >>>> LEADING >> >>>> > > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1 >> >>>> (n.config >> >>>> > > version) >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > Patch for 1732 is already included in the trunk. >> >>>> > > >> >>>> > > >> >>>> > > Thanks & Regards, >> >>>> > > Deepak >> >>>> > > >> >>>> > > >> >>>> > > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap < >> >>>> [email protected] >> >>>> > >wrote: >> >>>> > > >> >>>> > >> Hi Flavio, German, >> >>>> > >> >> >>>> > >> Since this fix is critical for zookeeper rolling upgrade is it >> ok >> >>>> if I >> >>>> > >> apply this patch to 3.5.0 trunk? >> >>>> > >> Is it straightforward to apply this patch to trunk? >> >>>> > >> >> >>>> > >> Thanks & Regards, >> >>>> > >> Deepak >> >>>> > >> >> >>>> > >> >> >>>> > >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap < >> >>>> > [email protected]>wrote: >> >>>> > >> >> >>>> > >>> Thanks German! >> >>>> > >>> Just wondering is there any chance that this patch may be >> applied >> >>>> to >> >>>> > >>> trunk in near future? >> >>>> > >>> If it's fine with you guys, I would be more than happy to >> apply the >> >>>> > >>> fixes (from 3.4.5) to trunk and test them. >> >>>> > >>> >> >>>> > >>> Thanks & Regards, >> >>>> > >>> Deepak >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco < >> >>>> > >>> [email protected]> wrote: >> >>>> > >>> >> >>>> > >>>> Hello Deepak, >> >>>> > >>>> >> >>>> > >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some >> >>>> cases in >> >>>> > >>>> which an ensemble can be formed so that it doesn't allow any >> other >> >>>> > >>>> zookeeper server to join. >> >>>> > >>>> This has been fixed in branch 3.4, but it hasn't been fixed in >> >>>> trunk >> >>>> > >>>> yet. >> >>>> > >>>> Check if the Notifications sent around contain different >> values >> >>>> for >> >>>> > the >> >>>> > >>>> vote in the members of the ensemble. >> >>>> > >>>> If you force a new election (e.g. by killing the leader) I >> guess >> >>>> > >>>> everything >> >>>> > >>>> should work normally, but don't take my word for it. >> >>>> > >>>> Flavio should know more about this. >> >>>> > >>>> >> >>>> > >>>> Cheers, >> >>>> > >>>> >> >>>> > >>>> German. >> >>>> > >>>> >> >>>> > >>>> >> >>>> > >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap < >> >>>> > [email protected] >> >>>> > >>>> >wrote: >> >>>> > >>>> >> >>>> > >>>> > Hi, >> >>>> > >>>> > >> >>>> > >>>> > I replacing one of the zookeeper server from 3 node quorum. >> >>>> > >>>> > Initially all zookeeper serves were running 3.5.0.1515976 >> >>>> version. >> >>>> > >>>> > I successfully replaced Node3 with newer version >> 3.5.0.1551730. >> >>>> > >>>> > When I am trying to replace Node2 with the same zookeeper >> >>>> version. >> >>>> > >>>> > I couldn't start zookeeper server on Node2 as it is >> continuously >> >>>> > >>>> stuck in >> >>>> > >>>> > leader election loop printing following messages: >> >>>> > >>>> > >> >>>> > >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO >> >>>> > >>>> > >> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] >> >>>> - >> >>>> > >>>> > Notification time out: 60000 >> >>>> > >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO >> >>>> > >>>> > [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller >> >>>> server >> >>>> > >>>> > identifier, so dropping the connection: (5, 3) >> >>>> > >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO >> >>>> > >>>> > [WorkerReceiver[myid=3]:FastLeaderElection@605] - >> >>>> Notification: 3 >> >>>> > >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), >> 3 >> >>>> > >>>> (n.sid), 0x0 >> >>>> > >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version) >> >>>> > >>>> > >> >>>> > >>>> > >> >>>> > >>>> > Network connections and configuration of the node being >> >>>> upgraded are >> >>>> > >>>> fine. >> >>>> > >>>> > The other 2 nodes in the quorum are fine and serving the >> >>>> request. >> >>>> > >>>> > >> >>>> > >>>> > Any idea what might be causing this? >> >>>> > >>>> > >> >>>> > >>>> > Thanks & Regards, >> >>>> > >>>> > Deepak >> >>>> > >>>> > >> >>>> > >>>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >> >> >>>> > > >> >>>> > >> >>>> >> >>> >> >>> >> >> >> > >
