Did you mean we should run this on a running environment to recover from the failure?
“bin/bookkeeper shell ledger -ledgeridformat long -m [ledger-id]" ________________________________ From: Sijie Guo <guosi...@gmail.com> Sent: Tuesday, December 3, 2019 1:10 PM To: Sharda, Ravi <ravi.sha...@dell.com> Cc: Enrico Olivelli <eolive...@gmail.com>; user <user@bookkeeper.apache.org>; Flavio Junqueira <f...@apache.org> Subject: Re: Bookeeper exception on pods restart [EXTERNAL EMAIL] I think 4.7.2 is using UUID as the ledger id formatter by default (it was a mistake, and reverted in its subsequent releases). So you might have to run “bin/bookkeeper shell ledger -ledgeridformat long -m [ledger-id]". Can you rerun this command again? --- > I saw this occurring for several ledgers in this environment. The IOException might be related to disk issues. Although I don't have enough information to tell. Thanks, Sijie On Mon, Dec 2, 2019 at 7:24 AM Sharda, Ravi <ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>> wrote: Enrico, I saw this occurring for several ledgers in this environment. ----------- ERROR - [BookieReadThreadPool-OrderedExecutor-3-0:ReadEntryProcessorV3@235] - IOException while reading entry: 5 from ledger 1243 [BookieReadThreadPool-OrderedExecutor-7-0:ReadEntryProcessorV3@235] - IOException while reading entry: 15 from ledger 1239 ERROR - [BookieReadThreadPool-OrderedExecutor-0-0:ReadEntryProcessorV3@235] - IOException while reading entry: 102 from ledger 64 ERROR - [BookieReadThreadPool-OrderedExecutor-0-0:ReadEntryProcessorV3@235] - IOException while reading entry: 7 from ledger 728 ________________________________ From: Enrico Olivelli <eolive...@gmail.com<mailto:eolive...@gmail.com>> Sent: Monday, December 2, 2019 7:01 PM To: user <user@bookkeeper.apache.org<mailto:user@bookkeeper.apache.org>> Cc: Sijie Guo <guosi...@gmail.com<mailto:guosi...@gmail.com>>; Flavio Junqueira <f...@apache.org<mailto:f...@apache.org>> Subject: Re: Bookeeper exception on pods restart [EXTERNAL EMAIL] sh-4.2# ./bookkeeper shell ledger -m 56 ERROR: invalid ledger id 56 ledger: Dump ledger index entries into readable format. usage: ledger [-m] <ledger_id> -m,--meta Print meta information Does it work for other ledgers ? Enrico Il giorno lun 2 dic 2019 alle ore 10:06 Sharda, Ravi <ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>> ha scritto: Hello Sijie, Any luck with this? Please let us know what could be going wrong. Thanks & best regards, Ravi ________________________________ From: Sharda, Ravi <ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>> Sent: Friday, November 29, 2019 3:28 PM To: Sijie Guo <guosi...@gmail.com<mailto:guosi...@gmail.com>> Cc: user <user@bookkeeper.apache.org<mailto:user@bookkeeper.apache.org>>; Flavio Junqueira <f...@apache.org<mailto:f...@apache.org>> Subject: Re: Bookeeper exception on pods restart Thanks. Here's the output of the command: sh-4.2# ./bookkeeper shell ledger -m 56 ERROR: invalid ledger id 56 ledger: Dump ledger index entries into readable format. usage: ledger [-m] <ledger_id> -m,--meta Print meta information ________________________________ From: Sijie Guo <guosi...@gmail.com<mailto:guosi...@gmail.com>> Sent: Friday, November 29, 2019 3:15 PM To: Sharda, Ravi <ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>> Cc: user <user@bookkeeper.apache.org<mailto:user@bookkeeper.apache.org>>; Flavio Junqueira <f...@apache.org<mailto:f...@apache.org>> Subject: Re: Bookeeper exception on pods restart [EXTERNAL EMAIL] Sorry, my bad. The command for reading ledger index should be "bookkeeper shell ledger". >From the `ls` output, I didn't find entry 1.log under ledgers directory. So I >guess the log file doesn't exist. If you can provide the output of `bookkeeper >shell ledger`, we can take a look at the index file to understand more. On Fri, Nov 29, 2019 at 1:20 AM Sharda, Ravi <ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>> wrote: For the following error, OrderedExecutor-0-0:ReadEntryProcessorV3@235] - IOException while reading entry: 25 from ledger 56 java.io<http://java.io/>.FileNotFoundException: No file for log 1 for 56 with location 4744138143 at org.apache.bookkeeper.bookie.EntryLogger.findFile(EntryLogger.java:1165) at org.apache.bookkeeper.bookie.EntryLogger.getChannelForLogId(EntryLogger.java:1100) at org.apache.bookkeeper.bookie.EntryLogger.internalReadEntry(EntryLogger.java:1002) at org.apache.bookkeeper.bookie.EntryLogger.readEntry(EntryLogger.java:1051) at org.apache.bookkeeper.bookie.InterleavedLedgerStorage.getEntry(InterleavedLedgerStorage.java:305) at org.apache.bookkeeper.bookie.SortedLedgerStorage.getEntry(SortedLedgerStorage.java:153) at org.apache.bookkeeper.bookie.L ----------- sh-4.2# ./bookkeeper shell readledger --ledgerid 56 ERROR: invalid value for option ledgerid : 56 Must specify a ledger id ----------- I didn't know how to check that the log file exists. Attaching the output of "ls -R -L", instead. ________________________________ From: Sijie Guo <guosi...@gmail.com<mailto:guosi...@gmail.com>> Sent: Friday, November 29, 2019 2:31 PM To: Sharda, Ravi <ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>> Cc: user <user@bookkeeper.apache.org<mailto:user@bookkeeper.apache.org>>; Flavio Junqueira <f...@apache.org<mailto:f...@apache.org>> Subject: Re: Bookeeper exception on pods restart [EXTERNAL EMAIL] If it is a permanent error, - check if the log file (indicated in the error message) exists or not. - use `bookkeeper shell readledger` to dump the index of the given ledger. - see if the index points to the right entry log file or not - Sijie On Fri, Nov 29, 2019 at 12:46 AM Sharda, Ravi <ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>> wrote: The latest instance we have seen is a permanent error. The bookies haven't recovered in the environment (last 2 days). In some previous instances, developers had reported that the bookies had recovered, but it is also possible that the error was slightly different from what we are seeing now. Thanks & best regards, Ravi ________________________________ From: Sijie Guo <guosi...@gmail.com<mailto:guosi...@gmail.com>> Sent: Friday, November 29, 2019 2:10 PM To: user <user@bookkeeper.apache.org<mailto:user@bookkeeper.apache.org>> Cc: Flavio Junqueira <f...@apache.org<mailto:f...@apache.org>>; Sharda, Ravi <ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>> Subject: Re: Bookeeper exception on pods restart [EXTERNAL EMAIL] Sorry for jumping into the discussion. But the error message indicates that the entry log file 1 is not found. It seems to me that entry log file was removed but the entry index still points to the old location. Is this error transient error or a permanent error? - Sijie On Fri, Nov 29, 2019 at 12:11 AM <prajakta.belgu...@dell.com<mailto:prajakta.belgu...@dell.com>> wrote: + Ravi, who will be looking into this …. From: Enrico Olivelli - Diennea <enrico.olive...@diennea.com<mailto:enrico.olive...@diennea.com>> Sent: Thursday, November 28, 2019 7:00 PM To: user@bookkeeper.apache.org<mailto:user@bookkeeper.apache.org> Cc: f...@apache.org<mailto:f...@apache.org> Subject: Re: Bookeeper exception on pods restart [EXTERNAL EMAIL] >From the error it looks like one client is trying to read an entry from the >Bookie but the entry is not there. I see two reasons: 1) The write never reached the bookie 2) The bookie is missing some file For 1) Do you have logs on the writer ? something that could tell us that a write did not succeed ? How old is supposed to be the entry ? Do you have logs on the reader that is trying to read the entry ? For 2) Do you have other errors in the logs about failed writes or whatever ? If you were on 4.9 we could use the ‘localconsistency checker’ and check for inconsistency on the bookie, it scans the bookie looking for every entry that should reside on the bookie itsself. If you were writing your ledgers with writequorum >= 2 maybe you can recover your data. In order to debug the problem we should compare the logs of: * The bookie * The writer * The reader Enrico Il giorno 28/11/19, 13:59 "prajakta.belgu...@dell.com<mailto:prajakta.belgu...@dell.com>" <prajakta.belgu...@dell.com<mailto:prajakta.belgu...@dell.com>> ha scritto: I understand that auto recovery would replicate data for under replicated ledgers. But it is scheduled to run only once in a while and may not have run before a reader tries to read this data from a certain bookie. Generally what does below exception indicate about the state of BK? Does it indicate that the entry is missing on the specific bookie and so we don’t find it? Or that something in the ledger metadata or ledgers could have been corrupted?? Found the same issue with another product where you seem to have provided a custom fix: https://github.com/diennea/herddb/issues/194 All in all want to understand if this can be the result of BK misconfiguration or is just a temporary unavailability problem that will resolve itself when auto-replication runs?? -Thanks, Prajakta From: Enrico Olivelli - Diennea <enrico.olive...@diennea.com<mailto:enrico.olive...@diennea.com>> Sent: Thursday, November 28, 2019 6:11 PM To: user@bookkeeper.apache.org<mailto:user@bookkeeper.apache.org> Cc: f...@apache.org<mailto:f...@apache.org> Subject: Re: Bookeeper exception on pods restart [EXTERNAL EMAIL] I don’t think there is a good value. You can use WriteQuorumSize = AckQuorumSize, this way you will see an error on the writing client in case of write failure to any of the bookies Usually you are enabling the Autorecovery feature to fill in the gaps of underreplicated ledgers: http://bookkeeper.apache.org/docs/4.10.0/admin/autorecovery/ Hope that helps Enrico Il giorno 28/11/19, 13:30 "prajakta.belgu...@dell.com<mailto:prajakta.belgu...@dell.com>" <prajakta.belgu...@dell.com<mailto:prajakta.belgu...@dell.com>> ha scritto: What EnsembleSize, WriteQuorumSize and AckQuorumSize would you recommend, so we never see this? What other ledger creation parameters do you need information about? -Thanks, Prajakta From: Enrico Olivelli - Diennea <enrico.olive...@diennea.com<mailto:enrico.olive...@diennea.com>> Sent: Thursday, November 28, 2019 5:19 PM To: user@bookkeeper.apache.org<mailto:user@bookkeeper.apache.org> Cc: f...@apache.org<mailto:f...@apache.org> Subject: Re: Bookeeper exception on pods restart [EXTERNAL EMAIL] Hi Prajakta, What ledger creation parameters are you using ? Ensamble size, Write quorum size, Ack quorum size ? If ackQuorumSize < WriteQuorumSize it is possible that a write to the bookie failed and even if the entry is supposed to be on the bookie it never reached it but the overall single write succeeded because a writequorum of bookies acknowledged the write. Enrico Il giorno 28/11/19, 12:44 "prajakta.belgu...@dell.com<mailto:prajakta.belgu...@dell.com>" <prajakta.belgu...@dell.com<mailto:prajakta.belgu...@dell.com>> ha scritto: Hello Team, We have a question about an issue we are running into with Bookeeper. We use bookkeeper version 4.7.3. This issue occurs occasionally when Bookkeeper servers are restarted. We see the following error in the logs for some time, which blocks Pravega's operations for the same duration. Not knowing the internals of Bookeeper, but just based on the exception alone, it seems like Bookeeper might not be locate the files temporarily. What could be causing this? 2019-11-28 03:52:26,491 - ERROR - [BookieReadThreadPool-OrderedExecutor-0-0:ReadEntryProcessorV3@235] - IOException while reading entry: 25 from ledger 56 java.io<https://slack-redir.net/link?url=http%3A%2F%2Fjava.io>.FileNotFoundException: No file for log 1 for 56 with location 4744138143 at org.apache.bookkeeper.bookie.EntryLogger.findFile(EntryLogger.java:1165) at org.apache.bookkeeper.bookie.EntryLogger.getChannelForLogId(EntryLogger.java:1100) at org.apache.bookkeeper.bookie.EntryLogger.internalReadEntry(EntryLogger.java:1002) at org.apache.bookkeeper.bookie.EntryLogger.readEntry(EntryLogger.java:1051) at org.apache.bookkeeper.bookie.InterleavedLedgerStorage.getEntry(InterleavedLedgerStorage.java:305) at org.apache.bookkeeper.bookie.SortedLedgerStorage.getEntry(SortedLedgerStorage.java:153) at org.apache.bookkeeper.bookie.LedgerDescriptorImpl.readEntry(LedgerDescriptorImpl.java:153) at org.apache.bookkeeper.bookie.Bookie.readEntry(Bookie.java:1305) at org.apache.bookkeeper.proto.ReadEntryProcessorV3.readEntry(ReadEntryProcessorV3.java:175) at org.apache.bookkeeper.proto.ReadEntryProcessorV3.readEntry(ReadEntryProcessorV3.java:155) at org.apache.bookkeeper.proto.ReadEntryProcessorV3.getReadResponse(ReadEntryProcessorV3.java:218) at org.apache.bookkeeper.proto.ReadEntryProcessorV3.executeOp(ReadEntryProcessorV3.java:264) at org.apache.bookkeeper.proto.ReadEntryProcessorV3.safeRun(ReadEntryProcessorV3.java:260) at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) -Thanks, Prajakta ________________________________ CONFIDENTIALITY & PRIVACY NOTICE This e-mail (including any attachments) is strictly confidential and may also contain privileged information. If you are not the intended recipient you are not authorised to read, print, save, process or disclose this message. If you have received this message by mistake, please inform the sender immediately and destroy this e-mail, its attachments and any copies. Any use, distribution, reproduction or disclosure by any person other than the intended recipient is strictly prohibited and the person responsible may incur in penalties. The use of this e-mail is only for professional purposes; there is no guarantee that the correspondence towards this e-mail will be read only by the recipient, because, under certain circumstances, there may be a need to access this email by third subjects belonging to the Company. ________________________________ CONFIDENTIALITY & PRIVACY NOTICE This e-mail (including any attachments) is strictly confidential and may also contain privileged information. If you are not the intended recipient you are not authorised to read, print, save, process or disclose this message. If you have received this message by mistake, please inform the sender immediately and destroy this e-mail, its attachments and any copies. Any use, distribution, reproduction or disclosure by any person other than the intended recipient is strictly prohibited and the person responsible may incur in penalties. The use of this e-mail is only for professional purposes; there is no guarantee that the correspondence towards this e-mail will be read only by the recipient, because, under certain circumstances, there may be a need to access this email by third subjects belonging to the Company. ________________________________ CONFIDENTIALITY & PRIVACY NOTICE This e-mail (including any attachments) is strictly confidential and may also contain privileged information. If you are not the intended recipient you are not authorised to read, print, save, process or disclose this message. If you have received this message by mistake, please inform the sender immediately and destroy this e-mail, its attachments and any copies. Any use, distribution, reproduction or disclosure by any person other than the intended recipient is strictly prohibited and the person responsible may incur in penalties. The use of this e-mail is only for professional purposes; there is no guarantee that the correspondence towards this e-mail will be read only by the recipient, because, under certain circumstances, there may be a need to access this email by third subjects belonging to the Company.