[
https://issues.apache.org/jira/browse/OAK-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991349#comment-14991349
]
Ian Boston edited comment on OAK-3547 at 11/5/15 9:01 AM:
----------------------------------------------------------
[~mreutegg] If an earlier version of the index is used by the writer, there
will be holes in the index and items will be missing. There are several
options. a) flag the issue to alert admins the index is not healthy, but
continue to index using an index that will open. b) Fail the index write and
stop indexing completely. c) Fail the index write and start re-indexing
automatically. Of those I think option a will deliver the best continuity.
Option b risks wide scale application level issues, option c risks both
application level issues and potential unavailability caused by the load or
rebuilding an index from scratch. There is no easy answer.
Now that there are checksums in place I have been seeing more frequent race
conditions between the writer and the readers which occasionally open older
versions. I think this is because the OakDirectory checks all the files when
its opened by computing a checksum of everything referenced. I think that
Lucene delays checking the file or checking the internals of a file until its
needed, hence any errors are more visible than before.
----
Lucene already has a concept of committing the index by syncing the segment_xx
and segment.gen files. I am writing the listing node on sync of either of these
or close of the index which has reduced the number of generations. The result
appears to be very stable. I have also introduced the concept of mutability as
some of the file types are mutable. .del is mutable, so the length and checksum
are not checked. If a .del from a later generation is used, that will only
delete the lucene docs that were deleted in that later generation. No damage.
segments.gen is also mutable. This is more of a problem. It is supposed to be a
fallback file with segment_xx used in preference, however if segment.gen is
used it will be from the wrong generation and will define the wrong set of
segment files for the index. I need to check if segment.gen is ever read. If it
is, then I think the OakDirectory needs to map segment.gen to a generational
version of the same (ie segment.gen_<epoch>) so that only .del files are
mutable. That should make the OakDirectory recoverable.
was (Author: ianeboston):
[~mreutegg] If an earlier version of the index is used by the writer, there
will he holes in the index and items will be missing. There are several
options. a) flag the issue to alert admins the index is not healthy, but
continue to index using an index that will open. b) Fail the index write and
stop indexing completely. c) Fail the index write and start re-indexing
automatically. Of those I think option a will deliver the best continuity.
Option b risks wide scale application level issues, option c risks both
application level issues and potential unavailability caused by the load or
rebuilding an index from scratch. There is no easy answer.
Now that there are checksums in place I have been seeing more frequent race
conditions between the writer and the readers which occasionally open older
versions. I think this is because the OakDirectory checks all the files when
its opened by computing a checksum of everything referenced. I think that
Lucene delays checking the file or checking the internals of a file until its
needed, hence any errors are more visible than before.
----
Lucene already has a concept of committing the index by syncing the segment_xx
and segment.gen files. I am writing the listing node on sync of either of these
or close of the index which has reduced the number of generations. The result
appears to be very stable. I have also introduced the concept of mutability as
some of the file types are mutable. .del is mutable, so the length and checksum
are not checked. If a .del from a later generation is used, that will only
delete the lucene docs that were deleted in that later generation. No damage.
segments.gen is also mutable. This is more of a problem. It is supposed to be a
fallback file with segment_xx used in preference, however if segment.gen is
used it will be from the wrong generation and will define the wrong set of
segment files for the index. I need to check if segment.gen is ever read. If it
is, then I think the OakDirectory needs to map segment.gen to a generational
version of the same (ie segment.gen_<epoch>) so that only .del files are
mutable. That should make the OakDirectory recoverable.
> Improve ability of the OakDirectory to recover from unexpected file errors
> --------------------------------------------------------------------------
>
> Key: OAK-3547
> URL: https://issues.apache.org/jira/browse/OAK-3547
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: lucene
> Affects Versions: 1.4
> Reporter: Ian Boston
>
> Currently if the OakDirectory finds that a file is missing or in some way
> damaged, and exception is thrown which impacts all queries using that index,
> at times making the index unavailable. This improvement aims to make the
> OakDirectory recover to a previously ok state by storing which files were
> involved in previous states, and giving the code some way of checking if they
> are valid.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)