Hi Jorn,

Thanks for reaching out to us, this is a very important exercise to make sure 
the upgrade path works as expected.

- Please do an `ls -al` in your data dir to make sure you have valid snapshot 
files.
- It would be also useful to expose the Admin port (8080/tcp by default) and 
check the output of `lastSnapshotCommand`.

Regards,
Andor





> On 2019. Aug 14., at 7:13, Jörn Franke <jornfra...@gmail.com> wrote:
> 
> For me the issue occurred only in standalone mode. With the ensemble I simply 
> cleared the data directory and it received the zookeeper data from the 
> quorum. 
> 
>> Am 13.08.2019 um 15:42 schrieb Koen De Groote <koen.degro...@limecraft.com>:
>> 
>> I would also like to know if this is possible.
>> 
>> From going over the github page, it seems there is a JMX method to force
>> the creation of a snapshot. Yet the docker image is configured as such that
>> a port will never be assigned to the JMX process.
>> 
>> Is there any way to bypass this?
>> 
>>> On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke <jornfra...@gmail.com> wrote:
>>> 
>>> Thanks. It is possible to force Zookeeper to create a snapshot? I will
>>> check I think the snapshot count is set to 1 in the cfg
>>> 
>>>> Am 30.07.2019 um 08:06 schrieb Enrico Olivelli <eolive...@gmail.com>:
>>>> 
>>>> Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <
>>> jornfra...@gmail.com>
>>>> ha scritto:
>>>> 
>>>>> ok, then let me verify tomorrow if a snapshot file is indeed there. If
>>> it
>>>>> is missing then I wonder why it was missing. There was no crash or
>>> whatever
>>>>> and 3.4.14 works without issue, but of course it could have loaded them
>>>>> from the log files. However, then I wonder why it does not create one.
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> I remember now that some other user, I think Sijie, reported a similar
>>>> problem some month ago, that it is not possible to upgrade from 3.4 to
>>> 3.5
>>>> if no snapshot is present.
>>>> IIRC The fix was to force the creation of at least one snapshot file and
>>>> then upgrade
>>>> 
>>>> Enrico
>>>> 
>>>> 
>>>>> 
>>>>> On Mon, Jul 29, 2019 at 11:45 PM Michael Han <h...@apache.org> wrote:
>>>>> 
>>>>>>>> I just wonder why it does not find a valid snapshot.
>>>>>> 
>>>>>> If there are local snapshot files and the files are valid, then it's a
>>>>> bug
>>>>>> that server fails to load them.
>>>>>> 
>>>>>>>> Is it because the format changed in 3.5.5 compared to 3.4.14?
>>>>>> 
>>>>>> Not I am aware of. There are some format changes (added compression
>>>>>> support) in master branch, but that's not shipped with 3.5.5.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <jornfra...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>>> ok, then it affects basically all standalone nodes? This is fine,
>>>>> despite
>>>>>>> that it means some extra work (for uncritical lab environments).
>>>>>>> I am not sure it is ZOOKEEPER-2325, but I don't know the full history
>>>>>>> behind it).The logs are fine (it works in 3.4.14 without issues, even
>>>>>> after
>>>>>>> downgrading back). There is no issue with disk space and there are no
>>> 0
>>>>>>> byte files.  I just wonder why it does not find a valid snapshot. Is
>>> it
>>>>>>> because the format changed in 3.5.5 compared to 3.4.14?
>>>>>>> 
>>>>>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han <h...@apache.org> wrote:
>>>>>>> 
>>>>>>>>>> java.io.IOException: No snapshot found, but there are log entries.
>>>>>>>> Something is broken!
>>>>>>>> 
>>>>>>>> This is expected behavior introduced in ZOOKEEPER-2325. We don't want
>>>>>> to
>>>>>>>> end up with potential inconsistent state across the ensemble when
>>>>>>>> recovering from empty snapshot.
>>>>>>>> 
>>>>>>>> To continue upgrade, just delete all txn log files and let the node
>>>>>> sync
>>>>>>>> the snapshot from the quorum.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <eolive...@gmail.com
>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Il lun 29 lug 2019, 22:32 Jörn Franke <jornfra...@gmail.com> ha
>>>>>>> scritto:
>>>>>>>>> 
>>>>>>>>>> It also seems that 3.5.5 does not attempt to read all of the
>>>>>> logfiles
>>>>>>>> (I
>>>>>>>>>> have to still confirm), but the two it reads exist, it has access
>>>>>> and
>>>>>>>>> they
>>>>>>>>>> are much more than 0 byte
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> We should have the stackstace of the EOFException.
>>>>>>>>> 
>>>>>>>>> Anyone on this list has a better idea?
>>>>>>>>> 
>>>>>>>>> Enrico
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
>>>>> jornfra...@gmail.com
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> (of course i do not run them at the same time)
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke <
>>>>>> jornfra...@gmail.com
>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> thank you for the quick reply. They read from the same disk
>>>>>> paths
>>>>>>>> and
>>>>>>>>>>>> have the same access rights (in fact the RHEL service executes
>>>>>>> them
>>>>>>>> as
>>>>>>>>>> the
>>>>>>>>>>>> same specific user).
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli <
>>>>>>>> eolive...@gmail.com
>>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Il lun 29 lug 2019, 21:50 Jörn Franke <jornfra...@gmail.com>
>>>>>> ha
>>>>>>>>>> scritto:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I tried to migrate a lab environment from Zookeepr 3.4.14
>>>>>> (used
>>>>>>>> for
>>>>>>>>>>>>> Solr)
>>>>>>>>>>>>>> to 3.5.5 and encountered an issue. It is ZooKeeper in
>>>>>>> standalone
>>>>>>>>> mode
>>>>>>>>>>>>>> (other environments have a proper ensemble). I increased
>>>>>>>>>> jute.maxbuffer
>>>>>>>>>>>>>> beyond the default (but not excessively) - this was working
>>>>>>>>> perfectly
>>>>>>>>>>>>> fine
>>>>>>>>>>>>>> in 3.4.14.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Basically I reuse for the migration the same config files,
>>>>>>> except
>>>>>>>>>> that
>>>>>>>>>>>>> I
>>>>>>>>>>>>>> whitelist some commands (later I am also interested in
>>>>> adding
>>>>>>>> SSL).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I have the following error message when starting Zookeeper
>>>>>> with
>>>>>>>>> 3.5.5
>>>>>>>>>>>>>> (basically, I just changed the symboling link from
>>>>> zookeeper
>>>>>> to
>>>>>>>>> point
>>>>>>>>>>>>> to
>>>>>>>>>>>>>> 3.5.5 instead of the 3.4.14 directory:
>>>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b34
>>>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b34
>>>>>>>>>>>>>> 2019-07-29 15:16:25,222 [myid:] - DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read
>>>>>>>>>>>>>> /zookeeper/version-2/log.b34
>>>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b72
>>>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b72
>>>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read
>>>>>>>>>>>>>> /zookeeper/version-2/log.b72
>>>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - ERROR
>>>>>>>>> [main:ZooKeeperServerMain@83
>>>>>>>>>> ]
>>>>>>>>>>>>> -
>>>>>>>>>>>>>> Unexpected exception, exiting abnormally
>>>>>>>>>>>>>> java.io.IOException: No snapshot found, but there are log
>>>>>>>> entries.
>>>>>>>>>>>>>> Something is broken!
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Strangely enough, if I switch back to 3.4.14 the issue is
>>>>>>>> resolved
>>>>>>>>>> and
>>>>>>>>>>>>>> Zookeeper works normally. However, I would like to leverage
>>>>>> the
>>>>>>>> new
>>>>>>>>>>>>> version
>>>>>>>>>>>>>> 3.5.5.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> There are no 0 bytes files. Disk space is plenty available.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Can you compare these logs with  logs of 3.4.x ? Are they
>>>>>> reading
>>>>>>>>> from
>>>>>>>>>>>>> the
>>>>>>>>>>>>> same disk paths?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Any idea beyond erasing the data dir (I would try to avoid
>>>>>> it,
>>>>>>> I
>>>>>>>>> can
>>>>>>>>>>>>>> reconstruct it, but still)?  I will try also in the other
>>>>>>>>>> environments
>>>>>>>>>>>>> and
>>>>>>>>>>>>>> also with an environment with an ensemble, but i would like
>>>>>> to
>>>>>>>> know
>>>>>>>>>>>>> before
>>>>>>>>>>>>>> what the issue could be.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Not sure if it is relevant, but:
>>>>>>>>>>>>>> Activated Kerberos Authentication and Kerberos SSL for
>>>>>> clients
>>>>>>>> and
>>>>>>>>>>>>> quorum.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Quorum? In standalone mode there is no 'quorum' auth
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Enrico
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 

Reply via email to