[
https://issues.apache.org/jira/browse/OAK-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marcel Reutegger updated OAK-2685:
----------------------------------
Attachment: OAK-2685.patch
Updated my github branch to current trunk and attached diff/patch.
> Track root state revision when reading the tree
> -----------------------------------------------
>
> Key: OAK-2685
> URL: https://issues.apache.org/jira/browse/OAK-2685
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: core, mongomk
> Reporter: Marcel Reutegger
> Assignee: Marcel Reutegger
> Labels: performance
> Fix For: 1.3.1
>
> Attachments: OAK-2685.patch
>
>
> Currently the DocumentNodeState has two revisions:
> - {{getRevision()}} returns the read revision of this node state. This
> revision was used to read the node state from the underlying {{NodeDocument}}.
> - {{getLastRevision()}} returns the revision when this node state was last
> modified. This revision also reflects changes done further below the tree
> when the node state was not directly affected by a change.
> The lastRevision of a state is then used as the read revision of the child
> node states. This avoids reading the entire tree again with a different
> revision after the head revision changed because of a commit.
> This approach has at least two problems related to comparing node states:
> - It does not work well with the current DiffCache implementation and affects
> the hit rate of this cache. The DiffCache is pro-actively populated after a
> commit. The key for a diff is a combination of previous and current commit
> revision and the path. The value then tells what child nodes were
> added/removed/changed. As the comparison of node states proceeds and
> traverses the tree, the revision of a state may go back in time because the
> lastRevision is used as the read revision of the child nodes. This will cause
> misses in the diff cache, because the revisions do not match the previous and
> current commit revisions as used to create the cache entries. OAK-2562 tried
> to address this by keeping the read revision for child nodes at the read
> revision of the parent in calls of compareAgainstBaseState() when there is a
> diff cache hit. However, it turns out node state comparison does not always
> start at the root state. The {{EventQueue}} implementation in oak-jcr will
> start at the paths as indicated by the filter of the listener. This means,
> OAK-2562 is not effective in this case and the diff needs to be calculated
> again based on a set of revisions, which is different from the original
> commit.
> - When a diff is calculated for a parent with many child nodes, the
> {{DocumentNodeStore}} will perform a query on the underlying
> {{DocumentStore}} to get child nodes modified after a given timestamp. This
> timestamp is derived from the lower revision of the two lastRevisions of the
> parent node states to compare. The query gets problematic for the
> {{DocumentStore}} if the timestamp is too far in the past. This will happen
> when the parent node (and sub-tree) was not modified for some time. E.g. the
> {{MongoDocumentStore}} has an index on the _id and the _modified field. But
> if there are many child nodes the _id index will not be that helpful and if
> the timestamp is too far in the past, the _modified index is not selective
> either. This problem was already reported in OAK-1970 and linked issues.
> Both of the above problems could be addressed by keeping track of the read
> revision of the root node state in each of the node states as the tree is
> traversed. The revision of the root state would then be used e.g. to derive
> the timestamp for the _modified constraint in the query. Because the revision
> of the root state is rather recent, the _modified constraint is very
> selective and the index on it would be the preferred choice.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)