Marcel Reutegger created OAK-2685:
-------------------------------------

             Summary: Track root state revision when reading the tree
                 Key: OAK-2685
                 URL: https://issues.apache.org/jira/browse/OAK-2685
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: core, mongomk
            Reporter: Marcel Reutegger
            Assignee: Marcel Reutegger


Currently the DocumentNodeState has two revisions:

- {{getRevision()}} returns the read revision of this node state. This revision 
was used to read the node state from the underlying {{NodeDocument}}.
- {{getLastRevision()}} returns the revision when this node state was last 
modified. This revision also reflects changes done further below the tree when 
the node state was not directly affected by a change.

The lastRevision of a state is then used as the read revision of the child node 
states. This avoids reading the entire tree again with a different revision 
after the head revision changed because of a commit.

This approach has at least two problems related to comparing node states:

- It does not work well with the current DiffCache implementation and affects 
the hit rate of this cache. The DiffCache is pro-actively populated after a 
commit. The key for a diff is a combination of previous and current commit 
revision and the path. The value then tells what child nodes were 
added/removed/changed. As the comparison of node states proceeds and traverses 
the tree, the revision of a state may go back in time because the lastRevision 
is used as the read revision of the child nodes. This will cause misses in the 
diff cache, because the revisions do not match the previous and current commit 
revisions as used to create the cache entries. OAK-2562 tried to address this 
by keeping the read revision for child nodes at the read revision of the parent 
in calls of compareAgainstBaseState() when there is a diff cache hit. However, 
it turns out node state comparison does not always start at the root state. The 
{{EventQueue}} implementation in oak-jcr will start at the paths as indicated 
by the filter of the listener. This means, OAK-2562 is not effective in this 
case and the diff needs to be calculated again based on a set of revisions, 
which is different from the original commit.

- When a diff is calculated for a parent with many child nodes, the 
{{DocumentNodeStore}} will perform a query on the underlying {{DocumentStore}} 
to get child nodes modified after a given timestamp. This timestamp is derived 
from the lower revision of the two lastRevisions of the parent node states to 
compare. The query gets problematic for the {{DocumentStore}} if the timestamp 
is too far in the past. This will happen when the parent node (and sub-tree) 
was not modified for some time. E.g. the {{MongoDocumentStore}} has an index on 
the _id and the _modified field. But if there are many child nodes the _id 
index will not be that helpful and if the timestamp is too far in the past, the 
_modified index is not selective either. This problem was already reported in 
OAK-1970 and linked issues.

Both of the above problems could be addressed by keeping track of the read 
revision of the root node state in each of the node states as the tree is 
traversed. The revision of the root state would then be used e.g. to derive the 
timestamp for the _modified constraint in the query. Because the revision of 
the root state is rather recent, the _modified constraint is very selective and 
the index on it would be the preferred choice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to