[
https://issues.apache.org/jira/browse/OAK-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15551315#comment-15551315
]
Chetan Mehrotra edited comment on OAK-3547 at 10/6/16 8:28 AM:
---------------------------------------------------------------
Thanks [~ianeboston] for the feature patch. Some feedback below on current
approach
List of NodeState using generational approach [1]
{noformat}
Base state - No file added
===========================
/{saveDirectoryListing = true}
:data
:dir
l_1475740652371{state = }
l_1475740652360{state = }
:data
===========================
3 files added
===========================
/{saveDirectoryListing = true}
:data
:dir
l_1475740652371{state = }
l_1475740652394{state =
foo2,foo2,612,5b41ac56a8454e48af2dd7aa740f71a96a740366;foo1,foo1,233,ff0e1a49171201d638848c3f3e9b003596a81d6b;foo0,foo0,73,7feabefe2276000fa7722d6536183a508816d2c7;}
l_1475740652360{state = }
:data
foo2{blobSize = 1047552, jcr:lastModified = 1475740652394, jcr:data = [1
binaries], uniqueKey = 84dba956e1634dc14812db9a7ae6a81c}
foo1{blobSize = 1047552, jcr:lastModified = 1475740652393, jcr:data = [1
binaries], uniqueKey = a9505647e35a7f0e0365adbc2531c91f}
foo0{blobSize = 1047552, jcr:lastModified = 1475740652392, jcr:data = [1
binaries], uniqueKey = 937a0057a90df312f452aceb96f40c8c}
===========================
3 'bar' files added
===========================
/{saveDirectoryListing = true}
:data
:dir
l_1475740652371{state = }
l_1475740652407{state =
foo2,foo2,612,5b41ac56a8454e48af2dd7aa740f71a96a740366;bar2,bar2,656,2ea3ee49550e65f45d2e8d706e1a3bcef2d4a8b3;foo1,foo1,233,ff0e1a49171201d638848c3f3e9b003596a81d6b;bar1,bar1,953,a296f858ccb93414d6e99ca7a1997fb711faa65d;foo0,foo0,73,7feabefe2276000fa7722d6536183a508816d2c7;bar0,bar0,433,83ad0b2e68f114aa901c9eee0715c4507cd5dfe1;}
l_1475740652394{state =
foo2,foo2,612,5b41ac56a8454e48af2dd7aa740f71a96a740366;foo1,foo1,233,ff0e1a49171201d638848c3f3e9b003596a81d6b;foo0,foo0,73,7feabefe2276000fa7722d6536183a508816d2c7;}
l_1475740652360{state = }
:data
foo2{blobSize = 1047552, jcr:lastModified = 1475740652394, jcr:data = [1
binaries], uniqueKey = 84dba956e1634dc14812db9a7ae6a81c}
foo1{blobSize = 1047552, jcr:lastModified = 1475740652393, jcr:data = [1
binaries], uniqueKey = a9505647e35a7f0e0365adbc2531c91f}
foo0{blobSize = 1047552, jcr:lastModified = 1475740652392, jcr:data = [1
binaries], uniqueKey = 937a0057a90df312f452aceb96f40c8c}
bar1{blobSize = 1047552, jcr:lastModified = 1475740652406, jcr:data = [1
binaries], uniqueKey = 5ff21364ce67199cddd14833f0614d73}
bar2{blobSize = 1047552, jcr:lastModified = 1475740652406, jcr:data = [1
binaries], uniqueKey = c10e5f46e90e01d927f0aee82980ac6d}
bar0{blobSize = 1047552, jcr:lastModified = 1475740652405, jcr:data = [1
binaries], uniqueKey = 0f32d06997a4fec9a90060b863cda454}
===========================
{noformat}
* Empty {{:data}} created under {{:data}}. Is this required or possibly due bug
below. In line #2 same builder should be used
{code}
private SimpleDirectoryListing(@Nonnull IndexDefinition definition,
@Nonnull NodeBuilder builder) {
this.definition = definition;
this.directoryBuilder = getOrCreateChild(builder,
INDEX_DATA_CHILD_NAME);
this.fileNames.addAll(getListing());
}
{code}
* load and save is called every time even if no change is done. This adds empty
l_ddd nodes. This should be avoided
* LISTING_STATE_PROPERTY -
** It holds an encoded listing info stored as single string property. Hopefully
this does not grow very large if directory content is large. Not sure of
typical sizes
** You can possibly use a MultivalueProperty here.
* Not sure on below snippet in {{load}} method.
{code}
if (loaded >= 0 ) {
return (loaded != childNodes.size() - 1);
}
{code}
* {{doGC}} and {{sync}} methods do not have any test coverage
h4. Feature Flag
It would be more comforting if this feature is driven by a feature flag. So
generation logic is used if enabled otherwise it defaults to
{{GenerationalDirectoryListing}}. We can expose a setting in
{{LuceneIndexProviderService}} to lock new feature so as to enable controlled
testing.
I think having flag to enable when it is not being used would be easy. What
would be tricky is to have it disabled once enabled ... that aspect can
possibly be ignored
h4. Effect of corruption on writes
Currently if the corruption occurs system would automatically fallback to older
version. This is fine for reads but for writes this would mean data loss unless
indexed. As Async indexer would only index newer stuff. We have 2 options here
# Let async indexer continue but provide some indication that index is corrupt
and reindex is required in some time - This needs to be highlighted in
prominenet way (periodic logs, JMX etc)
# Let fallback used for reads (readOnly == true) but let it fail for writes
Possibly this needs to be exposed as config option defaulting to #2 to avoid
surprises
h4. What type of corruption is being targetted
Current approach is trying to defend/detect against those corruption where
OakDirectory is used for read and write always. So once written if we are
opening a directory for IndexReader then it would proactively check that files
present in blob store are corrupted or not.
However in current usage setups make use of CopyOnRead and CopyOnWrite
directory along with OakDirectory. So we would need to have some support in
{{IndexCopier}} so as to check that files present on file system are valid wrt
expected checksum
h4. Move code to {{directory}} package
OakDirectory is getting bigger with this change. As new classes
{{GenerationalDirectoryListing}} and other are static it would be better if we
add them to directory package. This would simplify unit testing them also
[1]
{code}
@Test
public void saveListing() throws Exception{
NodeBuilder builder = EMPTY_NODE.builder();
builder.setProperty(LuceneIndexConstants.SAVE_DIR_LISTING, true);
Directory dir = createDir(builder, false);
dir.close();
dumpState(builder, "Base state - No file added");
Set<String> fileNames = newHashSet();
dir = createDir(builder, false);
createFile(dir, fileNames, "foo");
dir.close();
dumpState(builder, "3 files added");
dir = createDir(builder, false);
createFile(dir, fileNames, "bar");
dir.close();
dumpState(builder, "3 'bar' files added");
assertEquals(fileNames, newHashSet(dir.listAll()));
}
private void createFile(Directory dir, Set<String> fileNames, String
prefix) throws IOException {
for (int i = 0; i < 3; i++) {
String fileName = prefix + i;
createFile(dir, fileName);
fileNames.add(fileName);
}
}
private static void dumpState(NodeBuilder builder, String msg){
System.out.println();
System.out.println(msg);
System.out.println("===========================");
System.out.println(NodeStateUtils.toString(builder.getNodeState()));
System.out.println("===========================");
}
{code}
was (Author: chetanm):
List of NodeState using generational approach [1]
{noformat}
Base state - No file added
===========================
/{saveDirectoryListing = true}
:data
:dir
l_1475740652371{state = }
l_1475740652360{state = }
:data
===========================
3 files added
===========================
/{saveDirectoryListing = true}
:data
:dir
l_1475740652371{state = }
l_1475740652394{state =
foo2,foo2,612,5b41ac56a8454e48af2dd7aa740f71a96a740366;foo1,foo1,233,ff0e1a49171201d638848c3f3e9b003596a81d6b;foo0,foo0,73,7feabefe2276000fa7722d6536183a508816d2c7;}
l_1475740652360{state = }
:data
foo2{blobSize = 1047552, jcr:lastModified = 1475740652394, jcr:data = [1
binaries], uniqueKey = 84dba956e1634dc14812db9a7ae6a81c}
foo1{blobSize = 1047552, jcr:lastModified = 1475740652393, jcr:data = [1
binaries], uniqueKey = a9505647e35a7f0e0365adbc2531c91f}
foo0{blobSize = 1047552, jcr:lastModified = 1475740652392, jcr:data = [1
binaries], uniqueKey = 937a0057a90df312f452aceb96f40c8c}
===========================
3 'bar' files added
===========================
/{saveDirectoryListing = true}
:data
:dir
l_1475740652371{state = }
l_1475740652407{state =
foo2,foo2,612,5b41ac56a8454e48af2dd7aa740f71a96a740366;bar2,bar2,656,2ea3ee49550e65f45d2e8d706e1a3bcef2d4a8b3;foo1,foo1,233,ff0e1a49171201d638848c3f3e9b003596a81d6b;bar1,bar1,953,a296f858ccb93414d6e99ca7a1997fb711faa65d;foo0,foo0,73,7feabefe2276000fa7722d6536183a508816d2c7;bar0,bar0,433,83ad0b2e68f114aa901c9eee0715c4507cd5dfe1;}
l_1475740652394{state =
foo2,foo2,612,5b41ac56a8454e48af2dd7aa740f71a96a740366;foo1,foo1,233,ff0e1a49171201d638848c3f3e9b003596a81d6b;foo0,foo0,73,7feabefe2276000fa7722d6536183a508816d2c7;}
l_1475740652360{state = }
:data
foo2{blobSize = 1047552, jcr:lastModified = 1475740652394, jcr:data = [1
binaries], uniqueKey = 84dba956e1634dc14812db9a7ae6a81c}
foo1{blobSize = 1047552, jcr:lastModified = 1475740652393, jcr:data = [1
binaries], uniqueKey = a9505647e35a7f0e0365adbc2531c91f}
foo0{blobSize = 1047552, jcr:lastModified = 1475740652392, jcr:data = [1
binaries], uniqueKey = 937a0057a90df312f452aceb96f40c8c}
bar1{blobSize = 1047552, jcr:lastModified = 1475740652406, jcr:data = [1
binaries], uniqueKey = 5ff21364ce67199cddd14833f0614d73}
bar2{blobSize = 1047552, jcr:lastModified = 1475740652406, jcr:data = [1
binaries], uniqueKey = c10e5f46e90e01d927f0aee82980ac6d}
bar0{blobSize = 1047552, jcr:lastModified = 1475740652405, jcr:data = [1
binaries], uniqueKey = 0f32d06997a4fec9a90060b863cda454}
===========================
{noformat}
* Empty {{:data}} created under {{:data}}. Is this required or possibly due bug
below. In line #2 same builder should be used
{code}
private SimpleDirectoryListing(@Nonnull IndexDefinition definition,
@Nonnull NodeBuilder builder) {
this.definition = definition;
this.directoryBuilder = getOrCreateChild(builder,
INDEX_DATA_CHILD_NAME);
this.fileNames.addAll(getListing());
}
{code}
* load and save is called every time even if no change is done. This adds empty
l_ddd nodes. This should be avoided
* LISTING_STATE_PROPERTY -
** It holds an encoded listing info stored as single string property. Hopefully
this does not grow very large if directory content is large. Not sure of
typical sizes
** You can possibly use a MultivalueProperty here.
* Not sure on below snippet in {{load}} method.
{code}
if (loaded >= 0 ) {
return (loaded != childNodes.size() - 1);
}
{code}
* {{doGC}} and {{sync}} methods do not have any test coverage
*Feature Flag*
It would be more comforting if this feature is driven by a feature flag. So
generation logic is used if enabled otherwise it defaults to
{{GenerationalDirectoryListing}}. We can expose a setting in
{{LuceneIndexProviderService}} to lock new feature so as to enable controlled
testing.
I think having flag to enable when it is not being used would be easy. What
would be tricky is to have it disabled once enabled ... that aspect can
possibly be ignored
h5. Effect of corruption on writes
Currently if the corruption occurs system would automatically fallback to older
version. This is fine for reads but for writes this would mean data loss unless
indexed. As Async indexer would only index newer stuff. We have 2 options here
# Let async indexer continue but provide some indication that index is corrupt
and reindex is required in some time - This needs to be highlighted in
prominenet way (periodic logs, JMX etc)
# Let fallback used for reads (readOnly == true) but let it fail for writes
Possibly this needs to be exposed as config option defaulting to #2 to avoid
surprises
h5. What type of corruption is being targetted
Current approach is trying to defend/detect against those corruption where
OakDirectory is used for read and write always. So once written if we are
opening a directory for IndexReader then it would proactively check that files
present in blob store are corrupted or not.
However in current usage setups make use of CopyOnRead and CopyOnWrite
directory along with OakDirectory. So we would need to have some support in
{{IndexCopier}} so as to check that files present on file system are valid wrt
expected checksum
h5. Move code to {{directory}} package
OakDirectory is getting bigger with this change. As new classes
{{GenerationalDirectoryListing}} and other are static it would be better if we
add them to directory package. This would simplify unit testing them also
[1]
{code}
@Test
public void saveListing() throws Exception{
NodeBuilder builder = EMPTY_NODE.builder();
builder.setProperty(LuceneIndexConstants.SAVE_DIR_LISTING, true);
Directory dir = createDir(builder, false);
dir.close();
dumpState(builder, "Base state - No file added");
Set<String> fileNames = newHashSet();
dir = createDir(builder, false);
createFile(dir, fileNames, "foo");
dir.close();
dumpState(builder, "3 files added");
dir = createDir(builder, false);
createFile(dir, fileNames, "bar");
dir.close();
dumpState(builder, "3 'bar' files added");
assertEquals(fileNames, newHashSet(dir.listAll()));
}
private void createFile(Directory dir, Set<String> fileNames, String
prefix) throws IOException {
for (int i = 0; i < 3; i++) {
String fileName = prefix + i;
createFile(dir, fileName);
fileNames.add(fileName);
}
}
private static void dumpState(NodeBuilder builder, String msg){
System.out.println();
System.out.println(msg);
System.out.println("===========================");
System.out.println(NodeStateUtils.toString(builder.getNodeState()));
System.out.println("===========================");
}
{code}
> Improve ability of the OakDirectory to recover from unexpected file errors
> --------------------------------------------------------------------------
>
> Key: OAK-3547
> URL: https://issues.apache.org/jira/browse/OAK-3547
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: lucene
> Affects Versions: 1.4
> Reporter: Ian Boston
> Fix For: 1.6
>
>
> Currently if the OakDirectory finds that a file is missing or in some way
> damaged, and exception is thrown which impacts all queries using that index,
> at times making the index unavailable. This improvement aims to make the
> OakDirectory recover to a previously ok state by storing which files were
> involved in previous states, and giving the code some way of checking if they
> are valid.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)