I have completed by debug session and I have an answer. The problematic flow is the following: - LedgerHandler#asyncAddEntry - in case of BKNotEnoughBookiesException I'm triggering an async callback (with a CompletableFuture#handleAsync) which in turn calls LedgerHandler#close - the LedgerHandler#close method truncates the ledger and some entry is lost (at the time of the close call I have local LastAddConfirmed == LastAddPushed == 0 on the writer)
Is this a possible explanation ? I wonder which is the best way to react to an error (in this case BKNotEnoughBookiesException) in asyncAddEntry, usually I'm calling LedgerHandler#close in order to the necessary clean up, but in this case it seems very dangerous (I am running BK 4.4.0) This is the issue on my project, where I have put logs https://github.com/diennea/herddb/issues/100 Thanks -- Enrico 2017-03-17 16:14 GMT+01:00 Enrico Olivelli <[email protected]>: > > > 2017-03-17 7:48 GMT+01:00 Sijie Guo <[email protected]>: > >> >> >> On Wed, Mar 15, 2017 at 9:17 AM, Enrico Olivelli <[email protected]> >> wrote: >> >>> Hi BookKeepers, >>> I'm trying to understand some basic aspect of BookKeeper.... >>> >>> I have a strange case, and I would like to have some opinion from any >>> "expert". >>> >>> I am using BookKeeper as a transaction log for a database, this is an >>> example scenario: >>> >>> 1) the user inserts a record in a table >>> the "fact" is logged to the Write-Ahead-log (Bookkeeper ledger), and the >>> systems blocks until the asyncAddEntry completes (the callback is called >>> with rc=0) >>> >>> now I (the 'writer') assume that the entry is written to durable storage >>> and has been synched to disk >>> then the DB applies the change to the local memory as every application >>> with a WAL does >>> >>> 2) The writer 2 other entries for which the asyncAddEntry completes >>> (the callback is called with rc=0). I block until the callback as been >>> called >>> >>> 2) for tests I'm using an ensemble of 1 bookie, I "stop" the only bookie >>> the bookie stops "gracefully" (no strange errors) >>> >>> 3) The writer issues other asyncAddEntry, which fail due to >>> "notEnoughBookiesException" and then "closes" the LedgerHandle, inside the >>> "callback" cade >>> >>> 4) I restart the bookie (no strange errors on logs) >>> >>> 5) I restart my "writer", opening the ledger with "openLedger" (with >>> recovery and fencing...) in order to perform recovery from the WAL >>> >>> 6) now the LastAddConfirmed is 0 ! so my first entry is lost ! >>> >>> Am I missing something ? >>> The LAC has not been advanced even with 3 successful writes to the ledger >>> >> >> Are you using readLastAddConfirmed at step 5)? 0 is expected when you >> using #readLastAddConfirmed, since the call is retrieving LAC from bookies. >> >> > I have added a call to readLastAddConfirmed but it does not help, the LAC > is still 0 > > > I have double checked my code and the "close" happens in another thread, > maybe I am hitting a race condition in my code, in fact the problem is not > reproducible at 100% > > > I will continue debugging > > Thanks > > > > >> At step 3), it already closed the ledger and update the metadata's last >> entry id with 2. so for tailing read, you might need to call >> #readLastAddConfirmed and then get the last add confirmed using >> #getLastAddConfirmed. >> >> >> >>> >>> From the "writer" point-of-view the fact that the callback as been >>> called is not a guarantee that the entry as been written to durable storage >>> ? >>> >>> I understand that of a "tailing" reader (a ledger opened with >>> openLedgerNoRecovery) it is very important not to be able to "read" data >>> before the LAC, but for the "writer" it is important that it can count on >>> the fact the entry will be recovered with success >>> >>> >>> Thanks >>> -- Enrico >>> >>> >> >
