No I don't see the updatingEpoch file in /var/lib/zookeeper/version-2 I started zookeeper by adding set -x in /usr/bin/zookeeper-server I can see zookeeper is getting started with 3.4.13 as shown below . The complete logs are placed in the below gist
https://gist.github.com/debraj-manna/509ec3d497016c4a249ee2b8dace05d9 nohup java -Dzookeeper.datadir.autocreate=false -Dzookeeper.log.dir=/var/log/zookeeper -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp '/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/netty-3.10.5.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/lib/zookeeper/bin/../lib/jline-2.11.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.13.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/etc/zookeeper/conf::/etc/zookeeper/conf:/usr/lib/zookeeper/*:/usr/lib/zookeeper/lib/*' -Dzookeeper.log.threshold=INFO -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /etc/zookeeper/conf/zoo.cfg + sleep 1 + echo STARTED STARTED The content of zookeeper.log is placed in the below gist after the start https://gist.github.com/debraj-manna/9800c5bef32837c62bdfb324c0589ad6 Let me know if you need any more logs. On Mon, Aug 26, 2019 at 9:21 PM Andor Molnar <[email protected]> wrote: > I confirmed that the fix is included in 3.4.13. That’s why I asked if you > can see ‘updatingEpoch’ file in the data folder. > > I don’t think the issue is not related, but I want to make sure that > you’re running the right version by verifying the beginning of ZK logs. > > Andor > > > > > On 2019. Aug 26., at 13:43, Debraj Manna <[email protected]> > wrote: > > > > Below is the content of currentEpoch.tmp > > > > support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch > > 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch > > 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat > currentEpoch.tmp > > 8support@platform2 > > > > Starting zookeeper logs are rolled over as the issue was there for some > > time. Will the current log with the node in this state help? Btw why do > you > > think this issue may not be related to zookeeper? > > > > > > > > On Mon, Aug 26, 2019 at 4:56 PM Andor Molnar <[email protected]> wrote: > > > >> Hi Debraj, > >> > >> The fix should be in all 3.4 versions from 3.4.6 onward, including > 3.4.13. > >> Can you see ‘updatingEpoch’ file in /var/lib/zookeeper/version-2 ? > >> Also what is ‘currentEpoch.tmp’ ? I’m not sure if it relates to > ZooKeeper. > >> > >> Would you please share full startup logs of the failing node? > >> > >> Regards, > >> Andor > >> > >> > >> > >> > >>> On 2019. Aug 23., at 18:53, Debraj Manna <[email protected]> > >> wrote: > >>> > >>> Can someone answer by below query? > >>> > >>> I am getting confused after going through ZOOKEEPER-1653 > >>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and > >> ZOOKEEPER-2354 > >>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues > say > >> it > >>> is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in > >> 3.4.13 > >>> also. Can someone let me know if the issue is present in 3.4.13 also? > >>> > >>> > >>> On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, <[email protected]> > >>> wrote: > >>> > >>>> With the other two zookeeper servers running I stopped the zookeeper > in > >>>> the broken node and the deleted all the contents inside > >> /var/lib/zookeeper/version-2 > >>>> and started the zookeeper back on the node. It is running fine now and > >> got > >>>> all the data from the other servers. > >>>> > >>>> I am getting confused after going through ZOOKEEPER-1653 > >>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and > >> ZOOKEEPER-2354 > >>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues > say > >>>> it is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in > >>>> 3.4.13 also. Can someone let me know if the issue is present in 3.4.13 > >> also? > >>>> > >>>> > >>>> > >>>> On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna < > [email protected]> > >>>> wrote: > >>>> > >>>>> Thanks for replying. > >>>>> > >>>>> What is the recommended way to remove a node and delete all data from > >> it > >>>>> and make it start fresh? > >>>>> > >>>>> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, <[email protected]> > >>>>> wrote: > >>>>> > >>>>>> Hello, > >>>>>> Sorry for so late reply. > >>>>>> If you have 3 servers you can nuke the broken one and make it start > >> from > >>>>>> scratch, it will join the cluster and then recover data from the > other > >>>>>> servers > >>>>>> > >>>>>> Try it in a staging env, not in production > >>>>>> > >>>>>> Enrico > >>>>>> > >>>>>> Il mar 20 ago 2019, 20:30 Debraj Manna <[email protected]> > ha > >>>>>> scritto: > >>>>>> > >>>>>>> The same has been asked in stackoverflow > >>>>>>> < > >>>>>>> > >>>>>> > >> > https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid > >>>>>>>> > >>>>>>> also. But no response there also. > >>>>>>> > >>>>>>> Anyone any thoughts on this one? > >>>>>>> > >>>>>>> On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna < > >> [email protected] > >>>>>>> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Posted wrong Jira link. I meant > >>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2354. Can > someone > >>>>>> let > >>>>>>> me > >>>>>>>> know what is the recommended way to recover the node? > >>>>>>>> > >>>>>>>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat > >>>>>> acceptedEpoch > >>>>>>>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat > >>>>>> currentEpoch > >>>>>>>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat > >>>>>>> currentEpoch.tmp > >>>>>>>> 8support@platform2 > >>>>>>>> > >>>>>>>> On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna < > >>>>>> [email protected]> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Hi > >>>>>>>>> > >>>>>>>>> I am using a zookeeper ensemble of 3 nodes running 3.4.13. > >> Sometimes > >>>>>>>>> after reboot of machine zookeeper is not starting and I am seeing > >>>>>> the > >>>>>>> below > >>>>>>>>> errors in logs. > >>>>>>>>> > >>>>>>>>> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653 > . > >>>>>> Can > >>>>>>>>> someone let me if this is fixed in 3.4.13 or not as I can see the > >>>>>> issue > >>>>>>>>> still open? Also can somone suggest what is the recommended way > to > >>>>>>> recover > >>>>>>>>> the set-up ? > >>>>>>>>> > >>>>>>>>> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] - > >>>>>> Unable > >>>>>>>>> to load database on disk > >>>>>>>>> java.io.IOException: The current epoch, 7, is older than the last > >>>>>> zxid, > >>>>>>>>> 34359738370 > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674) > >>>>>>>>> at > >>>>>>>>> > >>>>>> > >> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635) > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170) > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114) > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81) > >>>>>>>>> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92 > ] > >> - > >>>>>>>>> Unexpected exception, exiting abnormally > >>>>>>>>> java.lang.RuntimeException: Unable to run quorum server > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693) > >>>>>>>>> at > >>>>>>>>> > >>>>>> > >> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635) > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170) > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114) > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81) > >>>>>>>>> Caused by: java.io.IOException: The current epoch, 7, is older > than > >>>>>> the > >>>>>>>>> last zxid, 34359738370 > >>>>>>>>> at > >>>>>>>>> > >>>>>>> > >>>>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674) > >>>>>>>>> ... 4 more---- > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >> > >> > >
