[ 
https://issues.apache.org/jira/browse/OAK-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15551315#comment-15551315
 ] 

Chetan Mehrotra commented on OAK-3547:
--------------------------------------

List of NodeState using generational approach [1]

{noformat}
Base state - No file added
===========================
  /{saveDirectoryListing = true}
    :data
      :dir
        l_1475740652371{state = }
        l_1475740652360{state = }
      :data
===========================

3 files added
===========================
  /{saveDirectoryListing = true}
    :data
      :dir
        l_1475740652371{state = }
        l_1475740652394{state = 
foo2,foo2,612,5b41ac56a8454e48af2dd7aa740f71a96a740366;foo1,foo1,233,ff0e1a49171201d638848c3f3e9b003596a81d6b;foo0,foo0,73,7feabefe2276000fa7722d6536183a508816d2c7;}
        l_1475740652360{state = }
      :data
      foo2{blobSize = 1047552, jcr:lastModified = 1475740652394, jcr:data = [1 
binaries], uniqueKey = 84dba956e1634dc14812db9a7ae6a81c}
      foo1{blobSize = 1047552, jcr:lastModified = 1475740652393, jcr:data = [1 
binaries], uniqueKey = a9505647e35a7f0e0365adbc2531c91f}
      foo0{blobSize = 1047552, jcr:lastModified = 1475740652392, jcr:data = [1 
binaries], uniqueKey = 937a0057a90df312f452aceb96f40c8c}
===========================


3 'bar' files added
===========================
  /{saveDirectoryListing = true}
    :data
      :dir
        l_1475740652371{state = }
        l_1475740652407{state = 
foo2,foo2,612,5b41ac56a8454e48af2dd7aa740f71a96a740366;bar2,bar2,656,2ea3ee49550e65f45d2e8d706e1a3bcef2d4a8b3;foo1,foo1,233,ff0e1a49171201d638848c3f3e9b003596a81d6b;bar1,bar1,953,a296f858ccb93414d6e99ca7a1997fb711faa65d;foo0,foo0,73,7feabefe2276000fa7722d6536183a508816d2c7;bar0,bar0,433,83ad0b2e68f114aa901c9eee0715c4507cd5dfe1;}
        l_1475740652394{state = 
foo2,foo2,612,5b41ac56a8454e48af2dd7aa740f71a96a740366;foo1,foo1,233,ff0e1a49171201d638848c3f3e9b003596a81d6b;foo0,foo0,73,7feabefe2276000fa7722d6536183a508816d2c7;}
        l_1475740652360{state = }
      :data
      foo2{blobSize = 1047552, jcr:lastModified = 1475740652394, jcr:data = [1 
binaries], uniqueKey = 84dba956e1634dc14812db9a7ae6a81c}
      foo1{blobSize = 1047552, jcr:lastModified = 1475740652393, jcr:data = [1 
binaries], uniqueKey = a9505647e35a7f0e0365adbc2531c91f}
      foo0{blobSize = 1047552, jcr:lastModified = 1475740652392, jcr:data = [1 
binaries], uniqueKey = 937a0057a90df312f452aceb96f40c8c}
      bar1{blobSize = 1047552, jcr:lastModified = 1475740652406, jcr:data = [1 
binaries], uniqueKey = 5ff21364ce67199cddd14833f0614d73}
      bar2{blobSize = 1047552, jcr:lastModified = 1475740652406, jcr:data = [1 
binaries], uniqueKey = c10e5f46e90e01d927f0aee82980ac6d}
      bar0{blobSize = 1047552, jcr:lastModified = 1475740652405, jcr:data = [1 
binaries], uniqueKey = 0f32d06997a4fec9a90060b863cda454}
===========================

{noformat}

* Empty {{:data}} created under {{:data}}. Is this required or possibly due bug 
below. In line #2 same builder should be used
{code}
    private SimpleDirectoryListing(@Nonnull IndexDefinition definition, 
@Nonnull NodeBuilder builder) {
        this.definition = definition;
        this.directoryBuilder = getOrCreateChild(builder, 
INDEX_DATA_CHILD_NAME);
        this.fileNames.addAll(getListing());
    }
{code}
* load and save is called every time even if no change is done. This adds empty 
l_ddd nodes. This should be avoided
* LISTING_STATE_PROPERTY - 
** It holds an encoded listing info stored as single string property. Hopefully 
this does not grow very large if directory content is large. Not sure of 
typical sizes
** You can possibly use a MultivalueProperty here. 
* Not sure on below snippet in {{load}} method.
{code}
if (loaded >= 0 ) {
    return (loaded != childNodes.size() - 1);
}
{code}
* {{doGC}} and {{sync}} methods do not have any test coverage


*Feature Flag*
It would be more comforting if this feature is driven by a feature flag. So 
generation logic is used if enabled otherwise it defaults to 
{{GenerationalDirectoryListing}}. We can expose a setting in 
{{LuceneIndexProviderService}} to lock new feature so as to enable controlled 
testing. 

I think having flag to enable when it is not being used would be easy. What 
would be tricky is to have it disabled once enabled ... that aspect can 
possibly be ignored

h5. Effect of corruption on writes

Currently if the corruption occurs system would automatically fallback to older 
version. This is fine for reads but for writes this would mean data loss unless 
indexed. As Async indexer would only index newer stuff. We have 2 options here

# Let async indexer continue but provide some indication that index is corrupt 
and reindex is required in some time - This needs to be highlighted in 
prominenet way (periodic logs, JMX etc)
# Let fallback used for reads (readOnly == true) but let it fail for writes

Possibly this needs to be exposed as config option defaulting to #2 to avoid 
surprises

h5. What type of corruption is being targetted

Current approach is trying to defend/detect against those corruption where 
OakDirectory is used for read and write always. So once written if we are 
opening a directory for IndexReader then it would proactively check that files 
present in blob store are corrupted or not.

However in current usage setups make use of CopyOnRead and CopyOnWrite 
directory along with OakDirectory. So we would need to have some support in 
{{IndexCopier}} so as to check that files present on file system are valid wrt 
expected checksum

h5. Move code to {{directory}} package

OakDirectory is getting bigger with this change. As new classes 
{{GenerationalDirectoryListing}} and other are static it would be better if we 
add them to directory package. This would simplify unit testing them also

[1]
{code}
    @Test
    public void saveListing() throws Exception{
        NodeBuilder builder = EMPTY_NODE.builder();
        builder.setProperty(LuceneIndexConstants.SAVE_DIR_LISTING, true);

        Directory dir = createDir(builder, false);
        dir.close();
        dumpState(builder, "Base state - No file added");
        
        Set<String> fileNames = newHashSet();
        
        dir = createDir(builder, false);
        createFile(dir, fileNames, "foo");
        dir.close();
        dumpState(builder, "3 files added");
        
        dir = createDir(builder, false);
        createFile(dir, fileNames, "bar");
        dir.close();
        dumpState(builder, "3 'bar' files added");

        assertEquals(fileNames, newHashSet(dir.listAll()));
    }

    private void createFile(Directory dir, Set<String> fileNames, String 
prefix) throws IOException {
        for (int i = 0; i < 3; i++) {
            String fileName = prefix + i;
            createFile(dir, fileName);
            fileNames.add(fileName);
        }
    }

    private static void dumpState(NodeBuilder builder, String msg){
        System.out.println();
        System.out.println(msg);
        System.out.println("===========================");
        System.out.println(NodeStateUtils.toString(builder.getNodeState()));
        System.out.println("===========================");
    }
{code}

> Improve ability of the OakDirectory to recover from unexpected file errors
> --------------------------------------------------------------------------
>
>                 Key: OAK-3547
>                 URL: https://issues.apache.org/jira/browse/OAK-3547
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>    Affects Versions: 1.4
>            Reporter: Ian Boston
>             Fix For: 1.6
>
>
> Currently if the OakDirectory finds that a file is missing or in some way 
> damaged, and exception is thrown which impacts all queries using that index, 
> at times making the index unavailable. This improvement aims to make the 
> OakDirectory recover to a previously ok state by storing which files were 
> involved in previous states, and giving the code some way of checking if they 
> are valid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to