Thanks. It is possible to force Zookeeper to create a snapshot? I will check I 
think the snapshot count is set to 1 in the cfg 

> Am 30.07.2019 um 08:06 schrieb Enrico Olivelli <[email protected]>:
> 
> Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <[email protected]>
> ha scritto:
> 
>> ok, then let me verify tomorrow if a snapshot file is indeed there. If it
>> is missing then I wonder why it was missing. There was no crash or whatever
>> and 3.4.14 works without issue, but of course it could have loaded them
>> from the log files. However, then I wonder why it does not create one.
>> 
> 
> 
> 
> I remember now that some other user, I think Sijie, reported a similar
> problem some month ago, that it is not possible to upgrade from 3.4 to 3.5
> if no snapshot is present.
> IIRC The fix was to force the creation of at least one snapshot file and
> then upgrade
> 
> Enrico
> 
> 
>> 
>> On Mon, Jul 29, 2019 at 11:45 PM Michael Han <[email protected]> wrote:
>> 
>>>>> I just wonder why it does not find a valid snapshot.
>>> 
>>> If there are local snapshot files and the files are valid, then it's a
>> bug
>>> that server fails to load them.
>>> 
>>>>> Is it because the format changed in 3.5.5 compared to 3.4.14?
>>> 
>>> Not I am aware of. There are some format changes (added compression
>>> support) in master branch, but that's not shipped with 3.5.5.
>>> 
>>> 
>>> 
>>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <[email protected]>
>> wrote:
>>> 
>>>> ok, then it affects basically all standalone nodes? This is fine,
>> despite
>>>> that it means some extra work (for uncritical lab environments).
>>>> I am not sure it is ZOOKEEPER-2325, but I don't know the full history
>>>> behind it).The logs are fine (it works in 3.4.14 without issues, even
>>> after
>>>> downgrading back). There is no issue with disk space and there are no 0
>>>> byte files.  I just wonder why it does not find a valid snapshot. Is it
>>>> because the format changed in 3.5.5 compared to 3.4.14?
>>>> 
>>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han <[email protected]> wrote:
>>>> 
>>>>>>> java.io.IOException: No snapshot found, but there are log entries.
>>>>> Something is broken!
>>>>> 
>>>>> This is expected behavior introduced in ZOOKEEPER-2325. We don't want
>>> to
>>>>> end up with potential inconsistent state across the ensemble when
>>>>> recovering from empty snapshot.
>>>>> 
>>>>> To continue upgrade, just delete all txn log files and let the node
>>> sync
>>>>> the snapshot from the quorum.
>>>>> 
>>>>> 
>>>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <[email protected]
>>> 
>>>>> wrote:
>>>>> 
>>>>>> Il lun 29 lug 2019, 22:32 Jörn Franke <[email protected]> ha
>>>> scritto:
>>>>>> 
>>>>>>> It also seems that 3.5.5 does not attempt to read all of the
>>> logfiles
>>>>> (I
>>>>>>> have to still confirm), but the two it reads exist, it has access
>>> and
>>>>>> they
>>>>>>> are much more than 0 byte
>>>>>>> 
>>>>>> 
>>>>>> We should have the stackstace of the EOFException.
>>>>>> 
>>>>>> Anyone on this list has a better idea?
>>>>>> 
>>>>>> Enrico
>>>>>> 
>>>>>> 
>>>>>>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
>> [email protected]
>>>> 
>>>>>> wrote:
>>>>>>> 
>>>>>>>> (of course i do not run them at the same time)
>>>>>>>> 
>>>>>>>> On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke <
>>> [email protected]
>>>>> 
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> thank you for the quick reply. They read from the same disk
>>> paths
>>>>> and
>>>>>>>>> have the same access rights (in fact the RHEL service executes
>>>> them
>>>>> as
>>>>>>> the
>>>>>>>>> same specific user).
>>>>>>>>> 
>>>>>>>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli <
>>>>> [email protected]
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Il lun 29 lug 2019, 21:50 Jörn Franke <[email protected]>
>>> ha
>>>>>>> scritto:
>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> I tried to migrate a lab environment from Zookeepr 3.4.14
>>> (used
>>>>> for
>>>>>>>>>> Solr)
>>>>>>>>>>> to 3.5.5 and encountered an issue. It is ZooKeeper in
>>>> standalone
>>>>>> mode
>>>>>>>>>>> (other environments have a proper ensemble). I increased
>>>>>>> jute.maxbuffer
>>>>>>>>>>> beyond the default (but not excessively) - this was working
>>>>>> perfectly
>>>>>>>>>> fine
>>>>>>>>>>> in 3.4.14.
>>>>>>>>>>> 
>>>>>>>>>>> Basically I reuse for the migration the same config files,
>>>> except
>>>>>>> that
>>>>>>>>>> I
>>>>>>>>>>> whitelist some commands (later I am also interested in
>> adding
>>>>> SSL).
>>>>>>>>>>> 
>>>>>>>>>>> I have the following error message when starting Zookeeper
>>> with
>>>>>> 3.5.5
>>>>>>>>>>> (basically, I just changed the symboling link from
>> zookeeper
>>> to
>>>>>> point
>>>>>>>>>> to
>>>>>>>>>>> 3.5.5 instead of the 3.4.14 directory:
>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b34
>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b34
>>>>>>>>>>> 2019-07-29 15:16:25,222 [myid:] - DEBUG
>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read
>>>>>>>>>>> /zookeeper/version-2/log.b34
>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b72
>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b72
>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - DEBUG
>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read
>>>>>>>>>>> /zookeeper/version-2/log.b72
>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - ERROR
>>>>>> [main:ZooKeeperServerMain@83
>>>>>>> ]
>>>>>>>>>> -
>>>>>>>>>>> Unexpected exception, exiting abnormally
>>>>>>>>>>> java.io.IOException: No snapshot found, but there are log
>>>>> entries.
>>>>>>>>>>> Something is broken!
>>>>>>>>>>>        at
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
>>>>>>>>>>>        at
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>>>>>>>>>>>        at
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
>>>>>>>>>>>        at
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
>>>>>>>>>>>        at
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
>>>>>>>>>>>        at
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
>>>>>>>>>>>        at
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
>>>>>>>>>>>        at
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
>>>>>>>>>>>        at
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
>>>>>>>>>>>        at
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
>>>>>>>>>>>        at
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
>>>>>>>>>>> 
>>>>>>>>>>> Strangely enough, if I switch back to 3.4.14 the issue is
>>>>> resolved
>>>>>>> and
>>>>>>>>>>> Zookeeper works normally. However, I would like to leverage
>>> the
>>>>> new
>>>>>>>>>> version
>>>>>>>>>>> 3.5.5.
>>>>>>>>>>> 
>>>>>>>>>>> There are no 0 bytes files. Disk space is plenty available.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Can you compare these logs with  logs of 3.4.x ? Are they
>>> reading
>>>>>> from
>>>>>>>>>> the
>>>>>>>>>> same disk paths?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> Any idea beyond erasing the data dir (I would try to avoid
>>> it,
>>>> I
>>>>>> can
>>>>>>>>>>> reconstruct it, but still)?  I will try also in the other
>>>>>>> environments
>>>>>>>>>> and
>>>>>>>>>>> also with an environment with an ensemble, but i would like
>>> to
>>>>> know
>>>>>>>>>> before
>>>>>>>>>>> what the issue could be.
>>>>>>>>>>> 
>>>>>>>>>>> Not sure if it is relevant, but:
>>>>>>>>>>> Activated Kerberos Authentication and Kerberos SSL for
>>> clients
>>>>> and
>>>>>>>>>> quorum.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Quorum? In standalone mode there is no 'quorum' auth
>>>>>>>>>> 
>>>>>>>>>> Enrico
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Reply via email to