Thanks. It is possible to force Zookeeper to create a snapshot? I will check I think the snapshot count is set to 1 in the cfg
> Am 30.07.2019 um 08:06 schrieb Enrico Olivelli <[email protected]>: > > Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <[email protected]> > ha scritto: > >> ok, then let me verify tomorrow if a snapshot file is indeed there. If it >> is missing then I wonder why it was missing. There was no crash or whatever >> and 3.4.14 works without issue, but of course it could have loaded them >> from the log files. However, then I wonder why it does not create one. >> > > > > I remember now that some other user, I think Sijie, reported a similar > problem some month ago, that it is not possible to upgrade from 3.4 to 3.5 > if no snapshot is present. > IIRC The fix was to force the creation of at least one snapshot file and > then upgrade > > Enrico > > >> >> On Mon, Jul 29, 2019 at 11:45 PM Michael Han <[email protected]> wrote: >> >>>>> I just wonder why it does not find a valid snapshot. >>> >>> If there are local snapshot files and the files are valid, then it's a >> bug >>> that server fails to load them. >>> >>>>> Is it because the format changed in 3.5.5 compared to 3.4.14? >>> >>> Not I am aware of. There are some format changes (added compression >>> support) in master branch, but that's not shipped with 3.5.5. >>> >>> >>> >>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <[email protected]> >> wrote: >>> >>>> ok, then it affects basically all standalone nodes? This is fine, >> despite >>>> that it means some extra work (for uncritical lab environments). >>>> I am not sure it is ZOOKEEPER-2325, but I don't know the full history >>>> behind it).The logs are fine (it works in 3.4.14 without issues, even >>> after >>>> downgrading back). There is no issue with disk space and there are no 0 >>>> byte files. I just wonder why it does not find a valid snapshot. Is it >>>> because the format changed in 3.5.5 compared to 3.4.14? >>>> >>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han <[email protected]> wrote: >>>> >>>>>>> java.io.IOException: No snapshot found, but there are log entries. >>>>> Something is broken! >>>>> >>>>> This is expected behavior introduced in ZOOKEEPER-2325. We don't want >>> to >>>>> end up with potential inconsistent state across the ensemble when >>>>> recovering from empty snapshot. >>>>> >>>>> To continue upgrade, just delete all txn log files and let the node >>> sync >>>>> the snapshot from the quorum. >>>>> >>>>> >>>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <[email protected] >>> >>>>> wrote: >>>>> >>>>>> Il lun 29 lug 2019, 22:32 Jörn Franke <[email protected]> ha >>>> scritto: >>>>>> >>>>>>> It also seems that 3.5.5 does not attempt to read all of the >>> logfiles >>>>> (I >>>>>>> have to still confirm), but the two it reads exist, it has access >>> and >>>>>> they >>>>>>> are much more than 0 byte >>>>>>> >>>>>> >>>>>> We should have the stackstace of the EOFException. >>>>>> >>>>>> Anyone on this list has a better idea? >>>>>> >>>>>> Enrico >>>>>> >>>>>> >>>>>>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke < >> [email protected] >>>> >>>>>> wrote: >>>>>>> >>>>>>>> (of course i do not run them at the same time) >>>>>>>> >>>>>>>> On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke < >>> [email protected] >>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>>> thank you for the quick reply. They read from the same disk >>> paths >>>>> and >>>>>>>>> have the same access rights (in fact the RHEL service executes >>>> them >>>>> as >>>>>>> the >>>>>>>>> same specific user). >>>>>>>>> >>>>>>>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli < >>>>> [email protected] >>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Il lun 29 lug 2019, 21:50 Jörn Franke <[email protected]> >>> ha >>>>>>> scritto: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I tried to migrate a lab environment from Zookeepr 3.4.14 >>> (used >>>>> for >>>>>>>>>> Solr) >>>>>>>>>>> to 3.5.5 and encountered an issue. It is ZooKeeper in >>>> standalone >>>>>> mode >>>>>>>>>>> (other environments have a proper ensemble). I increased >>>>>>> jute.maxbuffer >>>>>>>>>>> beyond the default (but not excessively) - this was working >>>>>> perfectly >>>>>>>>>> fine >>>>>>>>>>> in 3.4.14. >>>>>>>>>>> >>>>>>>>>>> Basically I reuse for the migration the same config files, >>>> except >>>>>>> that >>>>>>>>>> I >>>>>>>>>>> whitelist some commands (later I am also interested in >> adding >>>>> SSL). >>>>>>>>>>> >>>>>>>>>>> I have the following error message when starting Zookeeper >>> with >>>>>> 3.5.5 >>>>>>>>>>> (basically, I just changed the symboling link from >> zookeeper >>> to >>>>>> point >>>>>>>>>> to >>>>>>>>>>> 3.5.5 instead of the 3.4.14 directory: >>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655] >>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b34 >>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658] >>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b34 >>>>>>>>>>> 2019-07-29 15:16:25,222 [myid:] - DEBUG >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696] >>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read >>>>>>>>>>> /zookeeper/version-2/log.b34 >>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655] >>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b72 >>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658] >>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b72 >>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - DEBUG >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696] >>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read >>>>>>>>>>> /zookeeper/version-2/log.b72 >>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - ERROR >>>>>> [main:ZooKeeperServerMain@83 >>>>>>> ] >>>>>>>>>> - >>>>>>>>>>> Unexpected exception, exiting abnormally >>>>>>>>>>> java.io.IOException: No snapshot found, but there are log >>>>> entries. >>>>>>>>>>> Something is broken! >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82) >>>>>>>>>>> >>>>>>>>>>> Strangely enough, if I switch back to 3.4.14 the issue is >>>>> resolved >>>>>>> and >>>>>>>>>>> Zookeeper works normally. However, I would like to leverage >>> the >>>>> new >>>>>>>>>> version >>>>>>>>>>> 3.5.5. >>>>>>>>>>> >>>>>>>>>>> There are no 0 bytes files. Disk space is plenty available. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Can you compare these logs with logs of 3.4.x ? Are they >>> reading >>>>>> from >>>>>>>>>> the >>>>>>>>>> same disk paths? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Any idea beyond erasing the data dir (I would try to avoid >>> it, >>>> I >>>>>> can >>>>>>>>>>> reconstruct it, but still)? I will try also in the other >>>>>>> environments >>>>>>>>>> and >>>>>>>>>>> also with an environment with an ensemble, but i would like >>> to >>>>> know >>>>>>>>>> before >>>>>>>>>>> what the issue could be. >>>>>>>>>>> >>>>>>>>>>> Not sure if it is relevant, but: >>>>>>>>>>> Activated Kerberos Authentication and Kerberos SSL for >>> clients >>>>> and >>>>>>>>>> quorum. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Quorum? In standalone mode there is no 'quorum' auth >>>>>>>>>> >>>>>>>>>> Enrico >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>
