Marcel Reutegger created OAK-1555:
-------------------------------------
Summary: Inefficient node state diff with old revisions
Key: OAK-1555
URL: https://issues.apache.org/jira/browse/OAK-1555
Project: Jackrabbit Oak
Issue Type: Bug
Components: core, mongomk
Affects Versions: 0.18
Reporter: Marcel Reutegger
Assignee: Marcel Reutegger
Priority: Blocker
Fix For: 0.19
As part of OAK-1429 a number of improvements were implemented but one issue
remains when a node state diff is done with older revisions.
The DocumentNodeStore keeps a modified timestamp on each document and updates
it whenever the document is explicitly modified or implicitly when a descendant
document is updated. With this timestamp the store is able to tell when a
subtree was last modified. The diff implementation gets inefficient when the
two revisions to compare are older than the modified timestamp of a document
tree. In this case the implementation tends to read many more nodes than were
actually modified because it cannot exactly tell when a subtree was modified.
Improvements from OAK-1394 and OAK-1429 helped quite a bit because the diff
cache in the DocumentNodeStore is pro-actively filled by the commits. However,
in addition to the observation listeners that perform diffs there is also the
async index update, which periodically performs a diff. Those diff usually go
further back in time and are the ones that are inefficient and also have a
negative impact on the diff cache.
A solution to this problem was already discussed in a recent oak conf call. The
DocumentNodeStore keeps a journal of commits and uses it to answer node state
diff calls. With this journal the store should also be able to efficiently diff
across multiple commits. A number of options were discusses, whether to
implemented the journal with a local file or a capped MongoDB collection.
Ideas for alternative solutions are welcome...
--
This message was sent by Atlassian JIRA
(v6.2#6252)