[ https://issues.apache.org/jira/browse/OAK-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15551315#comment-15551315 ]
Chetan Mehrotra commented on OAK-3547: -------------------------------------- List of NodeState using generational approach [1] {noformat} Base state - No file added =========================== /{saveDirectoryListing = true} :data :dir l_1475740652371{state = } l_1475740652360{state = } :data =========================== 3 files added =========================== /{saveDirectoryListing = true} :data :dir l_1475740652371{state = } l_1475740652394{state = foo2,foo2,612,5b41ac56a8454e48af2dd7aa740f71a96a740366;foo1,foo1,233,ff0e1a49171201d638848c3f3e9b003596a81d6b;foo0,foo0,73,7feabefe2276000fa7722d6536183a508816d2c7;} l_1475740652360{state = } :data foo2{blobSize = 1047552, jcr:lastModified = 1475740652394, jcr:data = [1 binaries], uniqueKey = 84dba956e1634dc14812db9a7ae6a81c} foo1{blobSize = 1047552, jcr:lastModified = 1475740652393, jcr:data = [1 binaries], uniqueKey = a9505647e35a7f0e0365adbc2531c91f} foo0{blobSize = 1047552, jcr:lastModified = 1475740652392, jcr:data = [1 binaries], uniqueKey = 937a0057a90df312f452aceb96f40c8c} =========================== 3 'bar' files added =========================== /{saveDirectoryListing = true} :data :dir l_1475740652371{state = } l_1475740652407{state = foo2,foo2,612,5b41ac56a8454e48af2dd7aa740f71a96a740366;bar2,bar2,656,2ea3ee49550e65f45d2e8d706e1a3bcef2d4a8b3;foo1,foo1,233,ff0e1a49171201d638848c3f3e9b003596a81d6b;bar1,bar1,953,a296f858ccb93414d6e99ca7a1997fb711faa65d;foo0,foo0,73,7feabefe2276000fa7722d6536183a508816d2c7;bar0,bar0,433,83ad0b2e68f114aa901c9eee0715c4507cd5dfe1;} l_1475740652394{state = foo2,foo2,612,5b41ac56a8454e48af2dd7aa740f71a96a740366;foo1,foo1,233,ff0e1a49171201d638848c3f3e9b003596a81d6b;foo0,foo0,73,7feabefe2276000fa7722d6536183a508816d2c7;} l_1475740652360{state = } :data foo2{blobSize = 1047552, jcr:lastModified = 1475740652394, jcr:data = [1 binaries], uniqueKey = 84dba956e1634dc14812db9a7ae6a81c} foo1{blobSize = 1047552, jcr:lastModified = 1475740652393, jcr:data = [1 binaries], uniqueKey = a9505647e35a7f0e0365adbc2531c91f} foo0{blobSize = 1047552, jcr:lastModified = 1475740652392, jcr:data = [1 binaries], uniqueKey = 937a0057a90df312f452aceb96f40c8c} bar1{blobSize = 1047552, jcr:lastModified = 1475740652406, jcr:data = [1 binaries], uniqueKey = 5ff21364ce67199cddd14833f0614d73} bar2{blobSize = 1047552, jcr:lastModified = 1475740652406, jcr:data = [1 binaries], uniqueKey = c10e5f46e90e01d927f0aee82980ac6d} bar0{blobSize = 1047552, jcr:lastModified = 1475740652405, jcr:data = [1 binaries], uniqueKey = 0f32d06997a4fec9a90060b863cda454} =========================== {noformat} * Empty {{:data}} created under {{:data}}. Is this required or possibly due bug below. In line #2 same builder should be used {code} private SimpleDirectoryListing(@Nonnull IndexDefinition definition, @Nonnull NodeBuilder builder) { this.definition = definition; this.directoryBuilder = getOrCreateChild(builder, INDEX_DATA_CHILD_NAME); this.fileNames.addAll(getListing()); } {code} * load and save is called every time even if no change is done. This adds empty l_ddd nodes. This should be avoided * LISTING_STATE_PROPERTY - ** It holds an encoded listing info stored as single string property. Hopefully this does not grow very large if directory content is large. Not sure of typical sizes ** You can possibly use a MultivalueProperty here. * Not sure on below snippet in {{load}} method. {code} if (loaded >= 0 ) { return (loaded != childNodes.size() - 1); } {code} * {{doGC}} and {{sync}} methods do not have any test coverage *Feature Flag* It would be more comforting if this feature is driven by a feature flag. So generation logic is used if enabled otherwise it defaults to {{GenerationalDirectoryListing}}. We can expose a setting in {{LuceneIndexProviderService}} to lock new feature so as to enable controlled testing. I think having flag to enable when it is not being used would be easy. What would be tricky is to have it disabled once enabled ... that aspect can possibly be ignored h5. Effect of corruption on writes Currently if the corruption occurs system would automatically fallback to older version. This is fine for reads but for writes this would mean data loss unless indexed. As Async indexer would only index newer stuff. We have 2 options here # Let async indexer continue but provide some indication that index is corrupt and reindex is required in some time - This needs to be highlighted in prominenet way (periodic logs, JMX etc) # Let fallback used for reads (readOnly == true) but let it fail for writes Possibly this needs to be exposed as config option defaulting to #2 to avoid surprises h5. What type of corruption is being targetted Current approach is trying to defend/detect against those corruption where OakDirectory is used for read and write always. So once written if we are opening a directory for IndexReader then it would proactively check that files present in blob store are corrupted or not. However in current usage setups make use of CopyOnRead and CopyOnWrite directory along with OakDirectory. So we would need to have some support in {{IndexCopier}} so as to check that files present on file system are valid wrt expected checksum h5. Move code to {{directory}} package OakDirectory is getting bigger with this change. As new classes {{GenerationalDirectoryListing}} and other are static it would be better if we add them to directory package. This would simplify unit testing them also [1] {code} @Test public void saveListing() throws Exception{ NodeBuilder builder = EMPTY_NODE.builder(); builder.setProperty(LuceneIndexConstants.SAVE_DIR_LISTING, true); Directory dir = createDir(builder, false); dir.close(); dumpState(builder, "Base state - No file added"); Set<String> fileNames = newHashSet(); dir = createDir(builder, false); createFile(dir, fileNames, "foo"); dir.close(); dumpState(builder, "3 files added"); dir = createDir(builder, false); createFile(dir, fileNames, "bar"); dir.close(); dumpState(builder, "3 'bar' files added"); assertEquals(fileNames, newHashSet(dir.listAll())); } private void createFile(Directory dir, Set<String> fileNames, String prefix) throws IOException { for (int i = 0; i < 3; i++) { String fileName = prefix + i; createFile(dir, fileName); fileNames.add(fileName); } } private static void dumpState(NodeBuilder builder, String msg){ System.out.println(); System.out.println(msg); System.out.println("==========================="); System.out.println(NodeStateUtils.toString(builder.getNodeState())); System.out.println("==========================="); } {code} > Improve ability of the OakDirectory to recover from unexpected file errors > -------------------------------------------------------------------------- > > Key: OAK-3547 > URL: https://issues.apache.org/jira/browse/OAK-3547 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene > Affects Versions: 1.4 > Reporter: Ian Boston > Fix For: 1.6 > > > Currently if the OakDirectory finds that a file is missing or in some way > damaged, and exception is thrown which impacts all queries using that index, > at times making the index unavailable. This improvement aims to make the > OakDirectory recover to a previously ok state by storing which files were > involved in previous states, and giving the code some way of checking if they > are valid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)