Is there any issue with zookeeper 3.4.13? On Thu, Aug 29, 2019 at 10:13 AM Andor Molnar <an...@apache.org> wrote:
> Thanks for the info, I’m still looking. > So, this is an Ubuntu packaged version of ZooKeeper. > > Andor > > > > > On 2019. Aug 27., at 14:13, Debraj Manna <subharaj.ma...@gmail.com> > wrote: > > > > No I don't see the updatingEpoch file in /var/lib/zookeeper/version-2 > > > > I started zookeeper by adding set -x in /usr/bin/zookeeper-server I can > see > > zookeeper is getting started with 3.4.13 as shown below . The complete > logs > > are placed in the below gist > > > > https://gist.github.com/debraj-manna/509ec3d497016c4a249ee2b8dace05d9 > > > > nohup java -Dzookeeper.datadir.autocreate=false > > -Dzookeeper.log.dir=/var/log/zookeeper > > -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp > > > '/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/netty-3.10.5.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/lib/zookeeper/bin/../lib/jline-2.11.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.13.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/etc/zookeeper/conf::/etc/zookeeper/conf:/usr/lib/zookeeper/*:/usr/lib/zookeeper/lib/*' > > -Dzookeeper.log.threshold=INFO -Dcom.sun.management.jmxremote > > -Dcom.sun.management.jmxremote.local.only=false > > org.apache.zookeeper.server.quorum.QuorumPeerMain > > /etc/zookeeper/conf/zoo.cfg > > + sleep 1 > > + echo STARTED > > STARTED > > > > The content of zookeeper.log is placed in the below gist after the start > > > > https://gist.github.com/debraj-manna/9800c5bef32837c62bdfb324c0589ad6 > > > > Let me know if you need any more logs. > > > > On Mon, Aug 26, 2019 at 9:21 PM Andor Molnar <an...@apache.org> wrote: > > > >> I confirmed that the fix is included in 3.4.13. That’s why I asked if > you > >> can see ‘updatingEpoch’ file in the data folder. > >> > >> I don’t think the issue is not related, but I want to make sure that > >> you’re running the right version by verifying the beginning of ZK logs. > >> > >> Andor > >> > >> > >> > >>> On 2019. Aug 26., at 13:43, Debraj Manna <subharaj.ma...@gmail.com> > >> wrote: > >>> > >>> Below is the content of currentEpoch.tmp > >>> > >>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch > >>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch > >>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat > >> currentEpoch.tmp > >>> 8support@platform2 > >>> > >>> Starting zookeeper logs are rolled over as the issue was there for some > >>> time. Will the current log with the node in this state help? Btw why do > >> you > >>> think this issue may not be related to zookeeper? > >>> > >>> > >>> > >>> On Mon, Aug 26, 2019 at 4:56 PM Andor Molnar <an...@apache.org> wrote: > >>> > >>>> Hi Debraj, > >>>> > >>>> The fix should be in all 3.4 versions from 3.4.6 onward, including > >> 3.4.13. > >>>> Can you see ‘updatingEpoch’ file in /var/lib/zookeeper/version-2 ? > >>>> Also what is ‘currentEpoch.tmp’ ? I’m not sure if it relates to > >> ZooKeeper. > >>>> > >>>> Would you please share full startup logs of the failing node? > >>>> > >>>> Regards, > >>>> Andor > >>>> > >>>> > >>>> > >>>> > >>>>> On 2019. Aug 23., at 18:53, Debraj Manna <subharaj.ma...@gmail.com> > >>>> wrote: > >>>>> > >>>>> Can someone answer by below query? > >>>>> > >>>>> I am getting confused after going through ZOOKEEPER-1653 > >>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and > >>>> ZOOKEEPER-2354 > >>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues > >> say > >>>> it > >>>>> is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in > >>>> 3.4.13 > >>>>> also. Can someone let me know if the issue is present in 3.4.13 also? > >>>>> > >>>>> > >>>>> On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, < > subharaj.ma...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> With the other two zookeeper servers running I stopped the zookeeper > >> in > >>>>>> the broken node and the deleted all the contents inside > >>>> /var/lib/zookeeper/version-2 > >>>>>> and started the zookeeper back on the node. It is running fine now > and > >>>> got > >>>>>> all the data from the other servers. > >>>>>> > >>>>>> I am getting confused after going through ZOOKEEPER-1653 > >>>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and > >>>> ZOOKEEPER-2354 > >>>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues > >> say > >>>>>> it is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue > in > >>>>>> 3.4.13 also. Can someone let me know if the issue is present in > 3.4.13 > >>>> also? > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna < > >> subharaj.ma...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>>> Thanks for replying. > >>>>>>> > >>>>>>> What is the recommended way to remove a node and delete all data > from > >>>> it > >>>>>>> and make it start fresh? > >>>>>>> > >>>>>>> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, < > eolive...@gmail.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Hello, > >>>>>>>> Sorry for so late reply. > >>>>>>>> If you have 3 servers you can nuke the broken one and make it > start > >>>> from > >>>>>>>> scratch, it will join the cluster and then recover data from the > >> other > >>>>>>>> servers > >>>>>>>> > >>>>>>>> Try it in a staging env, not in production > >>>>>>>> > >>>>>>>> Enrico > >>>>>>>> > >>>>>>>> Il mar 20 ago 2019, 20:30 Debraj Manna <subharaj.ma...@gmail.com> > >> ha > >>>>>>>> scritto: > >>>>>>>> > >>>>>>>>> The same has been asked in stackoverflow > >>>>>>>>> < > >>>>>>>>> > >>>>>>>> > >>>> > >> > https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid > >>>>>>>>>> > >>>>>>>>> also. But no response there also. > >>>>>>>>> > >>>>>>>>> Anyone any thoughts on this one? > >>>>>>>>> > >>>>>>>>> On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna < > >>>> subharaj.ma...@gmail.com > >>>>>>>>> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Posted wrong Jira link. I meant > >>>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2354. Can > >> someone > >>>>>>>> let > >>>>>>>>> me > >>>>>>>>>> know what is the recommended way to recover the node? > >>>>>>>>>> > >>>>>>>>>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat > >>>>>>>> acceptedEpoch > >>>>>>>>>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat > >>>>>>>> currentEpoch > >>>>>>>>>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat > >>>>>>>>> currentEpoch.tmp > >>>>>>>>>> 8support@platform2 > >>>>>>>>>> > >>>>>>>>>> On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna < > >>>>>>>> subharaj.ma...@gmail.com> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi > >>>>>>>>>>> > >>>>>>>>>>> I am using a zookeeper ensemble of 3 nodes running 3.4.13. > >>>> Sometimes > >>>>>>>>>>> after reboot of machine zookeeper is not starting and I am > seeing > >>>>>>>> the > >>>>>>>>> below > >>>>>>>>>>> errors in logs. > >>>>>>>>>>> > >>>>>>>>>>> I have seen > https://issues.apache.org/jira/browse/ZOOKEEPER-1653 > >> . > >>>>>>>> Can > >>>>>>>>>>> someone let me if this is fixed in 3.4.13 or not as I can see > the > >>>>>>>> issue > >>>>>>>>>>> still open? Also can somone suggest what is the recommended way > >> to > >>>>>>>>> recover > >>>>>>>>>>> the set-up ? > >>>>>>>>>>> > >>>>>>>>>>> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] > - > >>>>>>>> Unable > >>>>>>>>>>> to load database on disk > >>>>>>>>>>> java.io.IOException: The current epoch, 7, is older than the > last > >>>>>>>> zxid, > >>>>>>>>>>> 34359738370 > >>>>>>>>>>> at > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674) > >>>>>>>>>>> at > >>>>>>>>>>> > >>>>>>>> > >>>> > org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635) > >>>>>>>>>>> at > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170) > >>>>>>>>>>> at > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114) > >>>>>>>>>>> at > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81) > >>>>>>>>>>> 2019-08-19 04:18:36,908 [myid:2] - ERROR > [main:QuorumPeerMain@92 > >> ] > >>>> - > >>>>>>>>>>> Unexpected exception, exiting abnormally > >>>>>>>>>>> java.lang.RuntimeException: Unable to run quorum server > >>>>>>>>>>> at > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693) > >>>>>>>>>>> at > >>>>>>>>>>> > >>>>>>>> > >>>> > org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635) > >>>>>>>>>>> at > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170) > >>>>>>>>>>> at > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114) > >>>>>>>>>>> at > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81) > >>>>>>>>>>> Caused by: java.io.IOException: The current epoch, 7, is older > >> than > >>>>>>>> the > >>>>>>>>>>> last zxid, 34359738370 > >>>>>>>>>>> at > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>> > >> > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674) > >>>>>>>>>>> ... 4 more---- > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>> > >>>> > >> > >> > >