[ 
https://issues.apache.org/jira/browse/OAK-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Marth updated OAK-2685:
-------------------------------
    Fix Version/s: 1.3.1

> Track root state revision when reading the tree
> -----------------------------------------------
>
>                 Key: OAK-2685
>                 URL: https://issues.apache.org/jira/browse/OAK-2685
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core, mongomk
>            Reporter: Marcel Reutegger
>            Assignee: Marcel Reutegger
>             Fix For: 1.3.1
>
>
> Currently the DocumentNodeState has two revisions:
> - {{getRevision()}} returns the read revision of this node state. This 
> revision was used to read the node state from the underlying {{NodeDocument}}.
> - {{getLastRevision()}} returns the revision when this node state was last 
> modified. This revision also reflects changes done further below the tree 
> when the node state was not directly affected by a change.
> The lastRevision of a state is then used as the read revision of the child 
> node states. This avoids reading the entire tree again with a different 
> revision after the head revision changed because of a commit.
> This approach has at least two problems related to comparing node states:
> - It does not work well with the current DiffCache implementation and affects 
> the hit rate of this cache. The DiffCache is pro-actively populated after a 
> commit. The key for a diff is a combination of previous and current commit 
> revision and the path. The value then tells what child nodes were 
> added/removed/changed. As the comparison of node states proceeds and 
> traverses the tree, the revision of a state may go back in time because the 
> lastRevision is used as the read revision of the child nodes. This will cause 
> misses in the diff cache, because the revisions do not match the previous and 
> current commit revisions as used to create the cache entries. OAK-2562 tried 
> to address this by keeping the read revision for child nodes at the read 
> revision of the parent in calls of compareAgainstBaseState() when there is a 
> diff cache hit. However, it turns out node state comparison does not always 
> start at the root state. The {{EventQueue}} implementation in oak-jcr will 
> start at the paths as indicated by the filter of the listener. This means, 
> OAK-2562 is not effective in this case and the diff needs to be calculated 
> again based on a set of revisions, which is different from the original 
> commit.
> - When a diff is calculated for a parent with many child nodes, the 
> {{DocumentNodeStore}} will perform a query on the underlying 
> {{DocumentStore}} to get child nodes modified after a given timestamp. This 
> timestamp is derived from the lower revision of the two lastRevisions of the 
> parent node states to compare. The query gets problematic for the 
> {{DocumentStore}} if the timestamp is too far in the past. This will happen 
> when the parent node (and sub-tree) was not modified for some time. E.g. the 
> {{MongoDocumentStore}} has an index on the _id and the _modified field. But 
> if there are many child nodes the _id index will not be that helpful and if 
> the timestamp is too far in the past, the _modified index is not selective 
> either. This problem was already reported in OAK-1970 and linked issues.
> Both of the above problems could be addressed by keeping track of the read 
> revision of the root node state in each of the node states as the tree is 
> traversed. The revision of the root state would then be used e.g. to derive 
> the timestamp for the _modified constraint in the query. Because the revision 
> of the root state is rather recent, the _modified constraint is very 
> selective and the index on it would be the preferred choice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to