Hi Debraj, The fix should be in all 3.4 versions from 3.4.6 onward, including 3.4.13. Can you see ‘updatingEpoch’ file in /var/lib/zookeeper/version-2 ? Also what is ‘currentEpoch.tmp’ ? I’m not sure if it relates to ZooKeeper.
Would you please share full startup logs of the failing node? Regards, Andor > On 2019. Aug 23., at 18:53, Debraj Manna <[email protected]> wrote: > > Can someone answer by below query? > > I am getting confused after going through ZOOKEEPER-1653 > <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and ZOOKEEPER-2354 > <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues say it > is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in 3.4.13 > also. Can someone let me know if the issue is present in 3.4.13 also? > > > On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, <[email protected]> > wrote: > >> With the other two zookeeper servers running I stopped the zookeeper in >> the broken node and the deleted all the contents inside >> /var/lib/zookeeper/version-2 >> and started the zookeeper back on the node. It is running fine now and got >> all the data from the other servers. >> >> I am getting confused after going through ZOOKEEPER-1653 >> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and ZOOKEEPER-2354 >> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues say >> it is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in >> 3.4.13 also. Can someone let me know if the issue is present in 3.4.13 also? >> >> >> >> On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna <[email protected]> >> wrote: >> >>> Thanks for replying. >>> >>> What is the recommended way to remove a node and delete all data from it >>> and make it start fresh? >>> >>> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, <[email protected]> >>> wrote: >>> >>>> Hello, >>>> Sorry for so late reply. >>>> If you have 3 servers you can nuke the broken one and make it start from >>>> scratch, it will join the cluster and then recover data from the other >>>> servers >>>> >>>> Try it in a staging env, not in production >>>> >>>> Enrico >>>> >>>> Il mar 20 ago 2019, 20:30 Debraj Manna <[email protected]> ha >>>> scritto: >>>> >>>>> The same has been asked in stackoverflow >>>>> < >>>>> >>>> https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid >>>>>> >>>>> also. But no response there also. >>>>> >>>>> Anyone any thoughts on this one? >>>>> >>>>> On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna <[email protected] >>>>> >>>>> wrote: >>>>> >>>>>> Posted wrong Jira link. I meant >>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2354. Can someone >>>> let >>>>> me >>>>>> know what is the recommended way to recover the node? >>>>>> >>>>>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat >>>> acceptedEpoch >>>>>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat >>>> currentEpoch >>>>>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat >>>>> currentEpoch.tmp >>>>>> 8support@platform2 >>>>>> >>>>>> On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna < >>>> [email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi >>>>>>> >>>>>>> I am using a zookeeper ensemble of 3 nodes running 3.4.13. Sometimes >>>>>>> after reboot of machine zookeeper is not starting and I am seeing >>>> the >>>>> below >>>>>>> errors in logs. >>>>>>> >>>>>>> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653 . >>>> Can >>>>>>> someone let me if this is fixed in 3.4.13 or not as I can see the >>>> issue >>>>>>> still open? Also can somone suggest what is the recommended way to >>>>> recover >>>>>>> the set-up ? >>>>>>> >>>>>>> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] - >>>> Unable >>>>>>> to load database on disk >>>>>>> java.io.IOException: The current epoch, 7, is older than the last >>>> zxid, >>>>>>> 34359738370 >>>>>>> at >>>>>>> >>>>> >>>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674) >>>>>>> at >>>>>>> >>>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635) >>>>>>> at >>>>>>> >>>>> >>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170) >>>>>>> at >>>>>>> >>>>> >>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114) >>>>>>> at >>>>>>> >>>>> >>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81) >>>>>>> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92] - >>>>>>> Unexpected exception, exiting abnormally >>>>>>> java.lang.RuntimeException: Unable to run quorum server >>>>>>> at >>>>>>> >>>>> >>>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693) >>>>>>> at >>>>>>> >>>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635) >>>>>>> at >>>>>>> >>>>> >>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170) >>>>>>> at >>>>>>> >>>>> >>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114) >>>>>>> at >>>>>>> >>>>> >>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81) >>>>>>> Caused by: java.io.IOException: The current epoch, 7, is older than >>>> the >>>>>>> last zxid, 34359738370 >>>>>>> at >>>>>>> >>>>> >>>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674) >>>>>>> ... 4 more---- >>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>>
