I have completed by debug session and I have an answer.

The problematic flow is the following:
- LedgerHandler#asyncAddEntry
- in case of BKNotEnoughBookiesException I'm triggering an async callback
(with a CompletableFuture#handleAsync) which in turn calls
LedgerHandler#close
- the LedgerHandler#close method truncates the ledger and some entry is
lost (at the time of the close call I have local LastAddConfirmed ==
LastAddPushed == 0 on the writer)

Is this a possible explanation ?
I wonder which is the best way to react to an error (in this case
BKNotEnoughBookiesException) in asyncAddEntry, usually I'm calling
LedgerHandler#close in order to the necessary clean up, but in this case it
seems very dangerous
(I am running BK 4.4.0)

This is the issue on my project, where I have put logs
https://github.com/diennea/herddb/issues/100

Thanks
-- Enrico








2017-03-17 16:14 GMT+01:00 Enrico Olivelli <[email protected]>:

>
>
> 2017-03-17 7:48 GMT+01:00 Sijie Guo <[email protected]>:
>
>>
>>
>> On Wed, Mar 15, 2017 at 9:17 AM, Enrico Olivelli <[email protected]>
>> wrote:
>>
>>> Hi BookKeepers,
>>> I'm trying to understand some basic aspect of BookKeeper....
>>>
>>> I have a strange case, and I would like to have some opinion from any
>>> "expert".
>>>
>>> I am using BookKeeper as a transaction log for a database, this is an
>>> example scenario:
>>>
>>> 1) the user inserts a record in a table
>>> the "fact" is logged to the Write-Ahead-log (Bookkeeper ledger), and the
>>> systems blocks until the asyncAddEntry completes (the callback is called
>>> with rc=0)
>>>
>>> now I (the 'writer') assume that the entry is written to durable storage
>>> and has been synched to disk
>>> then the DB applies the change to the local memory as every application
>>> with a WAL does
>>>
>>> 2) The writer 2  other entries for which the asyncAddEntry completes
>>> (the callback is called with rc=0). I block until the callback as been
>>> called
>>>
>>> 2) for tests I'm using an ensemble of 1 bookie, I "stop" the only bookie
>>> the bookie stops "gracefully" (no strange errors)
>>>
>>> 3) The writer issues other asyncAddEntry, which fail due to
>>> "notEnoughBookiesException" and then "closes" the LedgerHandle, inside the
>>> "callback" cade
>>>
>>> 4) I restart the bookie (no strange errors on logs)
>>>
>>> 5) I restart my "writer", opening the ledger with "openLedger" (with
>>> recovery and fencing...) in order to perform recovery from the WAL
>>>
>>> 6) now the LastAddConfirmed is 0 ! so my first entry is lost !
>>>
>>> Am I missing something ?
>>> The LAC has not been advanced even with 3 successful writes to the ledger
>>>
>>
>> Are you using readLastAddConfirmed at step 5)? 0 is expected when you
>> using #readLastAddConfirmed, since the call is retrieving LAC from bookies.
>>
>>
> I have added a call to readLastAddConfirmed but it does not help, the LAC
> is still 0
>
>
> I have double checked my code and the "close" happens in another thread,
> maybe I am hitting a race condition in my code, in fact the problem is not
> reproducible at 100%
>
>
> I will continue debugging
>
> Thanks
>
>
>
>
>> At step 3), it already closed the ledger and update the metadata's last
>> entry id with 2. so for tailing read, you might need to call
>> #readLastAddConfirmed and then get the last add confirmed using
>> #getLastAddConfirmed.
>>
>>
>>
>>>
>>> From the "writer" point-of-view the fact that the callback as been
>>> called is not a guarantee that the entry as been written to durable storage
>>> ?
>>>
>>> I understand that of a "tailing" reader (a ledger opened with
>>> openLedgerNoRecovery) it is very important not to be able to "read" data
>>> before the LAC, but for the "writer" it is important that it can count on
>>> the fact the entry will be recovered with success
>>>
>>>
>>> Thanks
>>> -- Enrico
>>>
>>>
>>
>

Reply via email to