Marcel Reutegger created OAK-2685:
-------------------------------------
Summary: Track root state revision when reading the tree
Key: OAK-2685
URL: https://issues.apache.org/jira/browse/OAK-2685
Project: Jackrabbit Oak
Issue Type: Improvement
Components: core, mongomk
Reporter: Marcel Reutegger
Assignee: Marcel Reutegger
Currently the DocumentNodeState has two revisions:
- {{getRevision()}} returns the read revision of this node state. This revision
was used to read the node state from the underlying {{NodeDocument}}.
- {{getLastRevision()}} returns the revision when this node state was last
modified. This revision also reflects changes done further below the tree when
the node state was not directly affected by a change.
The lastRevision of a state is then used as the read revision of the child node
states. This avoids reading the entire tree again with a different revision
after the head revision changed because of a commit.
This approach has at least two problems related to comparing node states:
- It does not work well with the current DiffCache implementation and affects
the hit rate of this cache. The DiffCache is pro-actively populated after a
commit. The key for a diff is a combination of previous and current commit
revision and the path. The value then tells what child nodes were
added/removed/changed. As the comparison of node states proceeds and traverses
the tree, the revision of a state may go back in time because the lastRevision
is used as the read revision of the child nodes. This will cause misses in the
diff cache, because the revisions do not match the previous and current commit
revisions as used to create the cache entries. OAK-2562 tried to address this
by keeping the read revision for child nodes at the read revision of the parent
in calls of compareAgainstBaseState() when there is a diff cache hit. However,
it turns out node state comparison does not always start at the root state. The
{{EventQueue}} implementation in oak-jcr will start at the paths as indicated
by the filter of the listener. This means, OAK-2562 is not effective in this
case and the diff needs to be calculated again based on a set of revisions,
which is different from the original commit.
- When a diff is calculated for a parent with many child nodes, the
{{DocumentNodeStore}} will perform a query on the underlying {{DocumentStore}}
to get child nodes modified after a given timestamp. This timestamp is derived
from the lower revision of the two lastRevisions of the parent node states to
compare. The query gets problematic for the {{DocumentStore}} if the timestamp
is too far in the past. This will happen when the parent node (and sub-tree)
was not modified for some time. E.g. the {{MongoDocumentStore}} has an index on
the _id and the _modified field. But if there are many child nodes the _id
index will not be that helpful and if the timestamp is too far in the past, the
_modified index is not selective either. This problem was already reported in
OAK-1970 and linked issues.
Both of the above problems could be addressed by keeping track of the read
revision of the root node state in each of the node states as the tree is
traversed. The revision of the root state would then be used e.g. to derive the
timestamp for the _modified constraint in the query. Because the revision of
the root state is rather recent, the _modified constraint is very selective and
the index on it would be the preferred choice.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)