Sorry to digress a little bit from existing conversations here…..

But this issue is almost always noticed on bookie restart ….so wanted to 
understand if this problem could be the result of unclean bookie shutdown….
In which case, what is the way to ensure a graceful termination of bookies, so 
we don’t lose/corrupt any data?

-Thanks,
Prajakta

From: Sijie Guo <guosi...@gmail.com>
Sent: Tuesday, December 3, 2019 1:53 PM
To: Sharda, Ravi
Cc: Belgundi, Prajakta; Enrico Olivelli; user; Flavio Junqueira
Subject: Re: Bookeeper exception on pods restart


[EXTERNAL EMAIL]
I mean the error "ERROR: invalid ledger id 56" is raised due to using a wrong 
ledger id formatter. I was suggesting you rerunning the command to collect more 
information so that we can debug.

- Sijie

On Tue, Dec 3, 2019 at 12:17 AM Sharda, Ravi 
<ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>> wrote:
Did you mean we should run this on a running environment to recover from the 
failure?

“bin/bookkeeper shell ledger -ledgeridformat long -m [ledger-id]"
________________________________
From: Sijie Guo <guosi...@gmail.com<mailto:guosi...@gmail.com>>
Sent: Tuesday, December 3, 2019 1:10 PM
To: Sharda, Ravi <ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>>
Cc: Enrico Olivelli <eolive...@gmail.com<mailto:eolive...@gmail.com>>; user 
<user@bookkeeper.apache.org<mailto:user@bookkeeper.apache.org>>; Flavio 
Junqueira <f...@apache.org<mailto:f...@apache.org>>
Subject: Re: Bookeeper exception on pods restart


[EXTERNAL EMAIL]
I think 4.7.2 is using UUID as the ledger id formatter by default (it was a 
mistake, and reverted in its subsequent releases).

So you might have to run “bin/bookkeeper shell ledger -ledgeridformat long -m 
[ledger-id]".

Can you rerun this command again?

---

> I saw this occurring for several ledgers in this environment.
The IOException might be related to disk issues. Although I don't have enough 
information to tell.

Thanks,
Sijie

On Mon, Dec 2, 2019 at 7:24 AM Sharda, Ravi 
<ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>> wrote:

Enrico,

I saw this occurring for several ledgers in this environment.

-----------

ERROR - [BookieReadThreadPool-OrderedExecutor-3-0:ReadEntryProcessorV3@235] - 
IOException while reading entry: 5 from ledger 1243

[BookieReadThreadPool-OrderedExecutor-7-0:ReadEntryProcessorV3@235] - 
IOException while reading entry: 15 from ledger 1239

ERROR - [BookieReadThreadPool-OrderedExecutor-0-0:ReadEntryProcessorV3@235] - 
IOException while reading entry: 102 from ledger 64

ERROR - [BookieReadThreadPool-OrderedExecutor-0-0:ReadEntryProcessorV3@235] - 
IOException while reading entry: 7 from ledger 728
________________________________
From: Enrico Olivelli <eolive...@gmail.com<mailto:eolive...@gmail.com>>
Sent: Monday, December 2, 2019 7:01 PM
To: user <user@bookkeeper.apache.org<mailto:user@bookkeeper.apache.org>>
Cc: Sijie Guo <guosi...@gmail.com<mailto:guosi...@gmail.com>>; Flavio Junqueira 
<f...@apache.org<mailto:f...@apache.org>>
Subject: Re: Bookeeper exception on pods restart


[EXTERNAL EMAIL]


sh-4.2# ./bookkeeper shell ledger -m 56
ERROR: invalid ledger id 56
ledger: Dump ledger index entries into readable format.
usage: ledger       [-m] <ledger_id>
 -m,--meta   Print meta information

Does it work for other ledgers ?

Enrico



Il giorno lun 2 dic 2019 alle ore 10:06 Sharda, Ravi 
<ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>> ha scritto:
Hello Sijie,

Any luck with this? Please let us know what could be going wrong.

Thanks & best regards,
Ravi
________________________________
From: Sharda, Ravi <ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>>
Sent: Friday, November 29, 2019 3:28 PM
To: Sijie Guo <guosi...@gmail.com<mailto:guosi...@gmail.com>>
Cc: user <user@bookkeeper.apache.org<mailto:user@bookkeeper.apache.org>>; 
Flavio Junqueira <f...@apache.org<mailto:f...@apache.org>>
Subject: Re: Bookeeper exception on pods restart

Thanks. Here's the output of the command:

sh-4.2# ./bookkeeper shell ledger -m 56
ERROR: invalid ledger id 56
ledger: Dump ledger index entries into readable format.
usage: ledger       [-m] <ledger_id>
 -m,--meta   Print meta information
________________________________
From: Sijie Guo <guosi...@gmail.com<mailto:guosi...@gmail.com>>
Sent: Friday, November 29, 2019 3:15 PM
To: Sharda, Ravi <ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>>
Cc: user <user@bookkeeper.apache.org<mailto:user@bookkeeper.apache.org>>; 
Flavio Junqueira <f...@apache.org<mailto:f...@apache.org>>
Subject: Re: Bookeeper exception on pods restart


[EXTERNAL EMAIL]
Sorry, my bad. The command for reading ledger index should be "bookkeeper shell 
ledger".

From the `ls` output, I didn't find entry 1.log under ledgers directory. So I 
guess the log file doesn't exist. If you can provide the output of `bookkeeper 
shell ledger`, we can take a look at the index file to understand more.

On Fri, Nov 29, 2019 at 1:20 AM Sharda, Ravi 
<ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>> wrote:
For the following error,


OrderedExecutor-0-0:ReadEntryProcessorV3@235] - IOException while reading 
entry: 25 from ledger 56

java.io<http://java.io/>.FileNotFoundException: No file for log 1 for 56 with 
location 4744138143

at org.apache.bookkeeper.bookie.EntryLogger.findFile(EntryLogger.java:1165)

at 
org.apache.bookkeeper.bookie.EntryLogger.getChannelForLogId(EntryLogger.java:1100)

at 
org.apache.bookkeeper.bookie.EntryLogger.internalReadEntry(EntryLogger.java:1002)

at org.apache.bookkeeper.bookie.EntryLogger.readEntry(EntryLogger.java:1051)

at 
org.apache.bookkeeper.bookie.InterleavedLedgerStorage.getEntry(InterleavedLedgerStorage.java:305)

at 
org.apache.bookkeeper.bookie.SortedLedgerStorage.getEntry(SortedLedgerStorage.java:153)

at org.apache.bookkeeper.bookie.L
-----------
sh-4.2# ./bookkeeper shell readledger --ledgerid 56
ERROR: invalid value for option ledgerid : 56
Must specify a ledger id

-----------
I didn't know how to check that the log file exists. Attaching the output of 
"ls -R -L", instead.

________________________________
From: Sijie Guo <guosi...@gmail.com<mailto:guosi...@gmail.com>>
Sent: Friday, November 29, 2019 2:31 PM
To: Sharda, Ravi <ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>>
Cc: user <user@bookkeeper.apache.org<mailto:user@bookkeeper.apache.org>>; 
Flavio Junqueira <f...@apache.org<mailto:f...@apache.org>>
Subject: Re: Bookeeper exception on pods restart


[EXTERNAL EMAIL]
If it is a permanent error,

- check if the log file (indicated in the error message) exists or not.
- use `bookkeeper shell readledger` to dump the index of the given ledger.
- see if the index points to the right entry log file or not

- Sijie

On Fri, Nov 29, 2019 at 12:46 AM Sharda, Ravi 
<ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>> wrote:
The latest instance we have seen is a permanent error. The bookies haven't 
recovered in the environment (last 2 days). In some previous instances, 
developers had reported that the bookies had recovered, but it is also possible 
that the error was slightly different from what we are seeing now.

Thanks & best regards,
Ravi
________________________________
From: Sijie Guo <guosi...@gmail.com<mailto:guosi...@gmail.com>>
Sent: Friday, November 29, 2019 2:10 PM
To: user <user@bookkeeper.apache.org<mailto:user@bookkeeper.apache.org>>
Cc: Flavio Junqueira <f...@apache.org<mailto:f...@apache.org>>; Sharda, Ravi 
<ravi.sha...@dell.com<mailto:ravi.sha...@dell.com>>
Subject: Re: Bookeeper exception on pods restart


[EXTERNAL EMAIL]
Sorry for jumping into the discussion. But the error message indicates that the 
entry log file 1 is not found.
It seems to me that entry log file was removed but the entry index still points 
to the old location. Is this error transient error or a permanent error?

- Sijie



On Fri, Nov 29, 2019 at 12:11 AM 
<prajakta.belgu...@dell.com<mailto:prajakta.belgu...@dell.com>> wrote:

+ Ravi, who will be looking into this ….



From: Enrico Olivelli - Diennea 
<enrico.olive...@diennea.com<mailto:enrico.olive...@diennea.com>>
Sent: Thursday, November 28, 2019 7:00 PM
To: user@bookkeeper.apache.org<mailto:user@bookkeeper.apache.org>
Cc: f...@apache.org<mailto:f...@apache.org>
Subject: Re: Bookeeper exception on pods restart



[EXTERNAL EMAIL]

From the error it looks like one client is trying to read an entry from the 
Bookie but the entry is not there.

I see two reasons:

1) The write never reached the bookie

2) The bookie is missing some file



For 1)

Do you have logs on the writer ? something that could tell us that a write did 
not succeed ?

How old is supposed to be the entry ?

Do you have logs on the reader that is trying to read the entry ?





For 2)

Do you have other errors in the logs about failed writes or whatever ?

If you were on 4.9 we could use the ‘localconsistency checker’ and check for 
inconsistency on the bookie, it scans the bookie looking for every entry that 
should reside on the bookie itsself.

If you were writing your ledgers with writequorum >= 2 maybe you can recover 
your data.





In order to debug the problem we should compare the logs of:

  *   The bookie
  *   The writer
  *   The reader





Enrico





Il giorno 28/11/19, 13:59 
"prajakta.belgu...@dell.com<mailto:prajakta.belgu...@dell.com>" 
<prajakta.belgu...@dell.com<mailto:prajakta.belgu...@dell.com>> ha scritto:



I understand that auto recovery would replicate data for under replicated 
ledgers.

But it is scheduled to run only once in a while and may not have run before a 
reader tries to read this data from a certain bookie.



Generally what does below exception indicate about the state of BK?

Does it indicate that the entry is missing on the specific bookie and so we 
don’t find it?

Or that something in the ledger metadata or ledgers could have been corrupted??



Found the same issue with another product where you seem to have provided a 
custom fix:

https://github.com/diennea/herddb/issues/194



All in all want to understand if this can be the result of BK misconfiguration 
or is just a temporary unavailability problem that will resolve itself when 
auto-replication runs??



-Thanks,

Prajakta



From: Enrico Olivelli - Diennea 
<enrico.olive...@diennea.com<mailto:enrico.olive...@diennea.com>>
Sent: Thursday, November 28, 2019 6:11 PM
To: user@bookkeeper.apache.org<mailto:user@bookkeeper.apache.org>
Cc: f...@apache.org<mailto:f...@apache.org>
Subject: Re: Bookeeper exception on pods restart



[EXTERNAL EMAIL]

I don’t think there is a good value.

You can use WriteQuorumSize = AckQuorumSize, this way you will see an error on 
the writing client in case of write failure to any of the bookies



Usually you are enabling the Autorecovery feature to fill in the gaps of 
underreplicated ledgers:

http://bookkeeper.apache.org/docs/4.10.0/admin/autorecovery/





Hope that helps

Enrico



Il giorno 28/11/19, 13:30 
"prajakta.belgu...@dell.com<mailto:prajakta.belgu...@dell.com>" 
<prajakta.belgu...@dell.com<mailto:prajakta.belgu...@dell.com>> ha scritto:



What EnsembleSize, WriteQuorumSize and AckQuorumSize would you recommend, so we 
never see this?

What other ledger creation parameters do you need information about?



-Thanks,

Prajakta

From: Enrico Olivelli - Diennea 
<enrico.olive...@diennea.com<mailto:enrico.olive...@diennea.com>>
Sent: Thursday, November 28, 2019 5:19 PM
To: user@bookkeeper.apache.org<mailto:user@bookkeeper.apache.org>
Cc: f...@apache.org<mailto:f...@apache.org>
Subject: Re: Bookeeper exception on pods restart



[EXTERNAL EMAIL]

Hi Prajakta,

What ledger creation parameters are you using ?  Ensamble size, Write quorum 
size, Ack quorum size ?

If ackQuorumSize < WriteQuorumSize it is possible that a write to the bookie 
failed and even if the entry is supposed to be on the bookie it never reached 
it but the overall single write succeeded because a writequorum of bookies 
acknowledged the write.



Enrico



Il giorno 28/11/19, 12:44 
"prajakta.belgu...@dell.com<mailto:prajakta.belgu...@dell.com>" 
<prajakta.belgu...@dell.com<mailto:prajakta.belgu...@dell.com>> ha scritto:



Hello Team,



We have a question about an issue we are running into with Bookeeper.

We use bookkeeper version 4.7.3.



This issue occurs occasionally when Bookkeeper servers are restarted.

We see the following error in the logs for some time, which blocks Pravega's 
operations for the same duration. Not knowing the internals of Bookeeper, but 
just based on the exception alone, it seems like Bookeeper might not be locate 
the files temporarily. What could be causing this?



2019-11-28 03:52:26,491 - ERROR - 
[BookieReadThreadPool-OrderedExecutor-0-0:ReadEntryProcessorV3@235] - 
IOException while reading entry: 25 from ledger 56
java.io<https://slack-redir.net/link?url=http%3A%2F%2Fjava.io>.FileNotFoundException:
 No file for log 1 for 56 with location 4744138143
        at 
org.apache.bookkeeper.bookie.EntryLogger.findFile(EntryLogger.java:1165)
        at 
org.apache.bookkeeper.bookie.EntryLogger.getChannelForLogId(EntryLogger.java:1100)
        at 
org.apache.bookkeeper.bookie.EntryLogger.internalReadEntry(EntryLogger.java:1002)
        at 
org.apache.bookkeeper.bookie.EntryLogger.readEntry(EntryLogger.java:1051)
        at 
org.apache.bookkeeper.bookie.InterleavedLedgerStorage.getEntry(InterleavedLedgerStorage.java:305)
        at 
org.apache.bookkeeper.bookie.SortedLedgerStorage.getEntry(SortedLedgerStorage.java:153)
        at 
org.apache.bookkeeper.bookie.LedgerDescriptorImpl.readEntry(LedgerDescriptorImpl.java:153)
        at org.apache.bookkeeper.bookie.Bookie.readEntry(Bookie.java:1305)
        at 
org.apache.bookkeeper.proto.ReadEntryProcessorV3.readEntry(ReadEntryProcessorV3.java:175)
        at 
org.apache.bookkeeper.proto.ReadEntryProcessorV3.readEntry(ReadEntryProcessorV3.java:155)
        at 
org.apache.bookkeeper.proto.ReadEntryProcessorV3.getReadResponse(ReadEntryProcessorV3.java:218)
        at 
org.apache.bookkeeper.proto.ReadEntryProcessorV3.executeOp(ReadEntryProcessorV3.java:264)
        at 
org.apache.bookkeeper.proto.ReadEntryProcessorV3.safeRun(ReadEntryProcessorV3.java:260)
        at 
org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)



-Thanks,

Prajakta





________________________________

CONFIDENTIALITY & PRIVACY NOTICE
This e-mail (including any attachments) is strictly confidential and may also 
contain privileged information. If you are not the intended recipient you are 
not authorised to read, print, save, process or disclose this message. If you 
have received this message by mistake, please inform the sender immediately and 
destroy this e-mail, its attachments and any copies. Any use, distribution, 
reproduction or disclosure by any person other than the intended recipient is 
strictly prohibited and the person responsible may incur in penalties.
The use of this e-mail is only for professional purposes; there is no guarantee 
that the correspondence towards this e-mail will be read only by the recipient, 
because, under certain circumstances, there may be a need to access this email 
by third subjects belonging to the Company.



________________________________

CONFIDENTIALITY & PRIVACY NOTICE
This e-mail (including any attachments) is strictly confidential and may also 
contain privileged information. If you are not the intended recipient you are 
not authorised to read, print, save, process or disclose this message. If you 
have received this message by mistake, please inform the sender immediately and 
destroy this e-mail, its attachments and any copies. Any use, distribution, 
reproduction or disclosure by any person other than the intended recipient is 
strictly prohibited and the person responsible may incur in penalties.
The use of this e-mail is only for professional purposes; there is no guarantee 
that the correspondence towards this e-mail will be read only by the recipient, 
because, under certain circumstances, there may be a need to access this email 
by third subjects belonging to the Company.



________________________________

CONFIDENTIALITY & PRIVACY NOTICE
This e-mail (including any attachments) is strictly confidential and may also 
contain privileged information. If you are not the intended recipient you are 
not authorised to read, print, save, process or disclose this message. If you 
have received this message by mistake, please inform the sender immediately and 
destroy this e-mail, its attachments and any copies. Any use, distribution, 
reproduction or disclosure by any person other than the intended recipient is 
strictly prohibited and the person responsible may incur in penalties.
The use of this e-mail is only for professional purposes; there is no guarantee 
that the correspondence towards this e-mail will be read only by the recipient, 
because, under certain circumstances, there may be a need to access this email 
by third subjects belonging to the Company.

Reply via email to