Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <[email protected]>
ha scritto:

> ok, then let me verify tomorrow if a snapshot file is indeed there. If it
> is missing then I wonder why it was missing. There was no crash or whatever
> and 3.4.14 works without issue, but of course it could have loaded them
> from the log files. However, then I wonder why it does not create one.
>



I remember now that some other user, I think Sijie, reported a similar
problem some month ago, that it is not possible to upgrade from 3.4 to 3.5
if no snapshot is present.
IIRC The fix was to force the creation of at least one snapshot file and
then upgrade

Enrico


>
> On Mon, Jul 29, 2019 at 11:45 PM Michael Han <[email protected]> wrote:
>
> > >> I just wonder why it does not find a valid snapshot.
> >
> > If there are local snapshot files and the files are valid, then it's a
> bug
> > that server fails to load them.
> >
> > >> Is it because the format changed in 3.5.5 compared to 3.4.14?
> >
> > Not I am aware of. There are some format changes (added compression
> > support) in master branch, but that's not shipped with 3.5.5.
> >
> >
> >
> > On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <[email protected]>
> wrote:
> >
> > > ok, then it affects basically all standalone nodes? This is fine,
> despite
> > > that it means some extra work (for uncritical lab environments).
> > > I am not sure it is ZOOKEEPER-2325, but I don't know the full history
> > > behind it).The logs are fine (it works in 3.4.14 without issues, even
> > after
> > > downgrading back). There is no issue with disk space and there are no 0
> > > byte files.  I just wonder why it does not find a valid snapshot. Is it
> > > because the format changed in 3.5.5 compared to 3.4.14?
> > >
> > > On Mon, Jul 29, 2019 at 11:25 PM Michael Han <[email protected]> wrote:
> > >
> > > > >> java.io.IOException: No snapshot found, but there are log entries.
> > > > Something is broken!
> > > >
> > > > This is expected behavior introduced in ZOOKEEPER-2325. We don't want
> > to
> > > > end up with potential inconsistent state across the ensemble when
> > > > recovering from empty snapshot.
> > > >
> > > > To continue upgrade, just delete all txn log files and let the node
> > sync
> > > > the snapshot from the quorum.
> > > >
> > > >
> > > > On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <[email protected]
> >
> > > > wrote:
> > > >
> > > > > Il lun 29 lug 2019, 22:32 Jörn Franke <[email protected]> ha
> > > scritto:
> > > > >
> > > > > > It also seems that 3.5.5 does not attempt to read all of the
> > logfiles
> > > > (I
> > > > > > have to still confirm), but the two it reads exist, it has access
> > and
> > > > > they
> > > > > > are much more than 0 byte
> > > > > >
> > > > >
> > > > > We should have the stackstace of the EOFException.
> > > > >
> > > > > Anyone on this list has a better idea?
> > > > >
> > > > > Enrico
> > > > >
> > > > >
> > > > > > On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
> [email protected]
> > >
> > > > > wrote:
> > > > > >
> > > > > > > (of course i do not run them at the same time)
> > > > > > >
> > > > > > > On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > >> thank you for the quick reply. They read from the same disk
> > paths
> > > > and
> > > > > > >> have the same access rights (in fact the RHEL service executes
> > > them
> > > > as
> > > > > > the
> > > > > > >> same specific user).
> > > > > > >>
> > > > > > >> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli <
> > > > [email protected]
> > > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >>> Il lun 29 lug 2019, 21:50 Jörn Franke <[email protected]>
> > ha
> > > > > > scritto:
> > > > > > >>>
> > > > > > >>> > Hi,
> > > > > > >>> >
> > > > > > >>> > I tried to migrate a lab environment from Zookeepr 3.4.14
> > (used
> > > > for
> > > > > > >>> Solr)
> > > > > > >>> > to 3.5.5 and encountered an issue. It is ZooKeeper in
> > > standalone
> > > > > mode
> > > > > > >>> > (other environments have a proper ensemble). I increased
> > > > > > jute.maxbuffer
> > > > > > >>> > beyond the default (but not excessively) - this was working
> > > > > perfectly
> > > > > > >>> fine
> > > > > > >>> > in 3.4.14.
> > > > > > >>> >
> > > > > > >>> > Basically I reuse for the migration the same config files,
> > > except
> > > > > > that
> > > > > > >>> I
> > > > > > >>> > whitelist some commands (later I am also interested in
> adding
> > > > SSL).
> > > > > > >>> >
> > > > > > >>> > I have the following error message when starting Zookeeper
> > with
> > > > > 3.5.5
> > > > > > >>> > (basically, I just changed the symboling link from
> zookeeper
> > to
> > > > > point
> > > > > > >>> to
> > > > > > >>> > 3.5.5 instead of the 3.4.14 directory:
> > > > > > >>> > 2019-07-29 15:16:25,217 [myid:] - DEBUG
> > > > > > >>> > [main:FileTxnLog$FileTxnIterator@655]
> > > > > > >>> > - Created new input stream /zookeeper/version-2/log.b34
> > > > > > >>> > 2019-07-29 15:16:25,217 [myid:] - DEBUG
> > > > > > >>> > [main:FileTxnLog$FileTxnIterator@658]
> > > > > > >>> > - Created new input archive /zookeeper/version-2/log.b34
> > > > > > >>> > 2019-07-29 15:16:25,222 [myid:] - DEBUG
> > > > > > >>> > [main:FileTxnLog$FileTxnIterator@696]
> > > > > > >>> > - EOF exception java.io.EOFException: Failed to read
> > > > > > >>> > /zookeeper/version-2/log.b34
> > > > > > >>> > 2019-07-29 15:16:25,223 [myid:] - DEBUG
> > > > > > >>> > [main:FileTxnLog$FileTxnIterator@655]
> > > > > > >>> > - Created new input stream /zookeeper/version-2/log.b72
> > > > > > >>> > 2019-07-29 15:16:25,223 [myid:] - DEBUG
> > > > > > >>> > [main:FileTxnLog$FileTxnIterator@658]
> > > > > > >>> > - Created new input archive /zookeeper/version-2/log.b72
> > > > > > >>> > 2019-07-29 15:16:25,224 [myid:] - DEBUG
> > > > > > >>> > [main:FileTxnLog$FileTxnIterator@696]
> > > > > > >>> > - EOF exception java.io.EOFException: Failed to read
> > > > > > >>> > /zookeeper/version-2/log.b72
> > > > > > >>> > 2019-07-29 15:16:25,224 [myid:] - ERROR
> > > > > [main:ZooKeeperServerMain@83
> > > > > > ]
> > > > > > >>> -
> > > > > > >>> > Unexpected exception, exiting abnormally
> > > > > > >>> > java.io.IOException: No snapshot found, but there are log
> > > > entries.
> > > > > > >>> > Something is broken!
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > >
> > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
> > > > > > >>> >         at
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
> > > > > > >>> >
> > > > > > >>> > Strangely enough, if I switch back to 3.4.14 the issue is
> > > > resolved
> > > > > > and
> > > > > > >>> > Zookeeper works normally. However, I would like to leverage
> > the
> > > > new
> > > > > > >>> version
> > > > > > >>> > 3.5.5.
> > > > > > >>> >
> > > > > > >>> > There are no 0 bytes files. Disk space is plenty available.
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Can you compare these logs with  logs of 3.4.x ? Are they
> > reading
> > > > > from
> > > > > > >>> the
> > > > > > >>> same disk paths?
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> > Any idea beyond erasing the data dir (I would try to avoid
> > it,
> > > I
> > > > > can
> > > > > > >>> > reconstruct it, but still)?  I will try also in the other
> > > > > > environments
> > > > > > >>> and
> > > > > > >>> > also with an environment with an ensemble, but i would like
> > to
> > > > know
> > > > > > >>> before
> > > > > > >>> > what the issue could be.
> > > > > > >>> >
> > > > > > >>> > Not sure if it is relevant, but:
> > > > > > >>> > Activated Kerberos Authentication and Kerberos SSL for
> > clients
> > > > and
> > > > > > >>> quorum.
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>> Quorum? In standalone mode there is no 'quorum' auth
> > > > > > >>>
> > > > > > >>> Enrico
> > > > > > >>>
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to