Thanks Michi!
On Mon, Mar 10, 2014 at 5:40 PM, Michi Mutsuzaki <[email protected]>wrote: > StandaloneDisabledTest.startSingleServerTest seems to be failing from > the same issue. We should fix this soon. > > https://issues.apache.org/jira/browse/ZOOKEEPER-1870 > > On Mon, Mar 10, 2014 at 5:33 PM, Deepak Jagtap <[email protected]> > wrote: > > Hello, > > > > Another query regarding 1805. > > I am observing zookeeper rolling upgrade is always succeeds when I apply > > 1805 patch. > > When I apply both 1810 and 1805 patch rolling upgrade fails due to an > > issue mentioned earlier. > > > > Please advise, if it's fine to use only patch 1805 for the trunk? > > > > Thanks & Regards, > > Deepak > > > > > > On Mon, Mar 10, 2014 at 3:11 PM, Deepak Jagtap <[email protected] > >wrote: > > > >> Hi German, > >> > >> I have applied patch 1810 and 1805 against trunk revision 1574686 > (recent > >> revision against which 1810 patch build succeeded). > >> But observing following error in the zookeeper log on the new node > joining > >> quorum: > >> > >> 2014-03-10 21:11:25,126 [myid:1] - INFO > >> [WorkerSender[myid=1]:QuorumCnxManager@195] - Have smaller server > >> identifier, so dropping the connection: (3, 1) > >> 2014-03-10 21:11:25,127 [myid:1] - INFO [/169.254.44.1:3888 > >> :QuorumCnxManager$Listener@540] - Received connection request / > >> 169.254.44.3:51507 > >> 2014-03-10 21:11:25,193 [myid:1] - ERROR > >> [WorkerReceiver[myid=1]:NIOServerCnxnFactory$1@92] - Thread > >> Thread[WorkerReceiver[myid=1],5,main] died > >> java.lang.OutOfMemoryError: Java heap space > >> at > >> > org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerReceiver.run(FastLeaderElection.java:273) > >> at java.lang.Thread.run(Unknown Source) > >> > >> Followed by these messages getting printed repeatedly: > >> 2014-03-10 21:11:25,328 [myid:1] - INFO > >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - > >> Notification time out: 400 > >> 2014-03-10 21:11:25,729 [myid:1] - INFO > >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - > >> Notification time out: 800 > >> 2014-03-10 21:11:26,530 [myid:1] - INFO > >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - > >> Notification time out: 1600 > >> 2014-03-10 21:11:28,131 [myid:1] - INFO > >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - > >> Notification time out: 3200 > >> 2014-03-10 21:11:31,332 [myid:1] - INFO > >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - > >> Notification time out: 6400 > >> > >> Thanks & Reagrds, > >> Deepak > >> > >> > >> > >> > >> > >> On Wed, Mar 5, 2014 at 11:50 AM, Deepak Jagtap <[email protected] > >wrote: > >> > >>> Hi, > >>> > >>> I have applied only 1805 patch, not 1810. > >>> And upgrade is from 3.5.0.1458648 to 3.5.0.1562289 (not from 3.4.5). > >>> It was failing very consistently in our environment, and after 1805 > patch > >>> it went smoothly. > >>> > >>> Regards, > >>> Deepak > >>> > >>> > >>> > >>> > >>> > >>> On Wed, Mar 5, 2014 at 7:36 AM, German Blanco < > >>> [email protected]> wrote: > >>> > >>>> Hello, > >>>> > >>>> do you mean ZOOKEEPER-1810 patch? > >>>> That one alone doesn't solve the problem. On the other hand, the > problem > >>>> doesn't happen always, so after a rolling start it might get solved. > >>>> We need 1818 as well, but it is easier to go step by step and get > 1810 in > >>>> trunk first. > >>>> I hope that as soon as 3.4.6 is out this might get some attention. > >>>> > >>>> Regards, > >>>> > >>>> German. > >>>> > >>>> > >>>> On Wed, Mar 5, 2014 at 2:17 AM, Deepak Jagtap < > [email protected] > >>>> >wrote: > >>>> > >>>> > Hi, > >>>> > > >>>> > Please ignore the previous comment, I used wrong jar file and hence > >>>> rolling > >>>> > upgrade failed. > >>>> > After applying patch for bug on zookeeper-3.5.0.1562289 > >>>> > revision, rolling upgrade went fine. > >>>> > > >>>> > I have patched in house zookeeper version, but it would be > convenient > >>>> if we > >>>> > apply patch on trunk and use the latest trunk. > >>>> > Please advise if I can apply the patch on the trunk and test it for > >>>> you. > >>>> > > >>>> > Thanks & Regards, > >>>> > Deepak > >>>> > > >>>> > > >>>> > On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap < > >>>> [email protected] > >>>> > >wrote: > >>>> > > >>>> > > Hi German, > >>>> > > > >>>> > > I tried applying patch for 1805 but problem still persists. > >>>> > > Following are the notification messages logged repeatedly by the > node > >>>> > > which fails to join the quorum: > >>>> > > > >>>> > > > >>>> > > 2014-03-04 20:00:54,398 [myid:2] - INFO > >>>> > > [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] > - > >>>> > > Notification time out: 51200 > >>>> > > 2014-03-04 20:00:54,400 [myid:2] - INFO > >>>> > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: > 2 > >>>> > > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 > >>>> (n.sid), > >>>> > 0x0 > >>>> > > (n.peerEPoch), LOOKING (my state)1 (n.config version) > >>>> > > 2014-03-04 20:00:54,401 [myid:2] - INFO > >>>> > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: > 3 > >>>> > > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING > >>>> (n.state), 1 > >>>> > > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config version) > >>>> > > 2014-03-04 20:00:54,403 [myid:2] - INFO > >>>> > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: > 3 > >>>> > > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round), > >>>> LEADING > >>>> > > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1 > >>>> (n.config > >>>> > > version) > >>>> > > > >>>> > > > >>>> > > > >>>> > > Patch for 1732 is already included in the trunk. > >>>> > > > >>>> > > > >>>> > > Thanks & Regards, > >>>> > > Deepak > >>>> > > > >>>> > > > >>>> > > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap < > >>>> [email protected] > >>>> > >wrote: > >>>> > > > >>>> > >> Hi Flavio, German, > >>>> > >> > >>>> > >> Since this fix is critical for zookeeper rolling upgrade is it ok > >>>> if I > >>>> > >> apply this patch to 3.5.0 trunk? > >>>> > >> Is it straightforward to apply this patch to trunk? > >>>> > >> > >>>> > >> Thanks & Regards, > >>>> > >> Deepak > >>>> > >> > >>>> > >> > >>>> > >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap < > >>>> > [email protected]>wrote: > >>>> > >> > >>>> > >>> Thanks German! > >>>> > >>> Just wondering is there any chance that this patch may be > applied > >>>> to > >>>> > >>> trunk in near future? > >>>> > >>> If it's fine with you guys, I would be more than happy to apply > the > >>>> > >>> fixes (from 3.4.5) to trunk and test them. > >>>> > >>> > >>>> > >>> Thanks & Regards, > >>>> > >>> Deepak > >>>> > >>> > >>>> > >>> > >>>> > >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco < > >>>> > >>> [email protected]> wrote: > >>>> > >>> > >>>> > >>>> Hello Deepak, > >>>> > >>>> > >>>> > >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some > >>>> cases in > >>>> > >>>> which an ensemble can be formed so that it doesn't allow any > other > >>>> > >>>> zookeeper server to join. > >>>> > >>>> This has been fixed in branch 3.4, but it hasn't been fixed in > >>>> trunk > >>>> > >>>> yet. > >>>> > >>>> Check if the Notifications sent around contain different values > >>>> for > >>>> > the > >>>> > >>>> vote in the members of the ensemble. > >>>> > >>>> If you force a new election (e.g. by killing the leader) I > guess > >>>> > >>>> everything > >>>> > >>>> should work normally, but don't take my word for it. > >>>> > >>>> Flavio should know more about this. > >>>> > >>>> > >>>> > >>>> Cheers, > >>>> > >>>> > >>>> > >>>> German. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap < > >>>> > [email protected] > >>>> > >>>> >wrote: > >>>> > >>>> > >>>> > >>>> > Hi, > >>>> > >>>> > > >>>> > >>>> > I replacing one of the zookeeper server from 3 node quorum. > >>>> > >>>> > Initially all zookeeper serves were running 3.5.0.1515976 > >>>> version. > >>>> > >>>> > I successfully replaced Node3 with newer version > 3.5.0.1551730. > >>>> > >>>> > When I am trying to replace Node2 with the same zookeeper > >>>> version. > >>>> > >>>> > I couldn't start zookeeper server on Node2 as it is > continuously > >>>> > >>>> stuck in > >>>> > >>>> > leader election loop printing following messages: > >>>> > >>>> > > >>>> > >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO > >>>> > >>>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] > >>>> - > >>>> > >>>> > Notification time out: 60000 > >>>> > >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO > >>>> > >>>> > [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller > >>>> server > >>>> > >>>> > identifier, so dropping the connection: (5, 3) > >>>> > >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO > >>>> > >>>> > [WorkerReceiver[myid=3]:FastLeaderElection@605] - > >>>> Notification: 3 > >>>> > >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 > >>>> > >>>> (n.sid), 0x0 > >>>> > >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version) > >>>> > >>>> > > >>>> > >>>> > > >>>> > >>>> > Network connections and configuration of the node being > >>>> upgraded are > >>>> > >>>> fine. > >>>> > >>>> > The other 2 nodes in the quorum are fine and serving the > >>>> request. > >>>> > >>>> > > >>>> > >>>> > Any idea what might be causing this? > >>>> > >>>> > > >>>> > >>>> > Thanks & Regards, > >>>> > >>>> > Deepak > >>>> > >>>> > > >>>> > >>>> > >>>> > >>> > >>>> > >>> > >>>> > >> > >>>> > > > >>>> > > >>>> > >>> > >>> > >> >
