[
https://issues.apache.org/jira/browse/OAK-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673499#comment-13673499
]
Jukka Zitting commented on OAK-853:
-----------------------------------
There's a
[TODO|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/Template.java#L441]
in the SegmentMK codebase for this case. The HAMT data structure used by the
SegmentMK allows for efficient diffing in this case, though that optimization
hasn't been implemented yet.
Thus I'd rather see a solution like this pushed down to {{KernelNodeState}} or
even further down the stack instead of having it in {{ModifiedNodeState}}.
More generally, as you say, I think the core problem here is
bq. that "base" is a ModifiedNodeState so no optimization can be used
Why is the base state a {{ModifiedNodeState}}? AFAICT the normal pattern would
be to compare a ModifiedNodeState against a previously persisted {{Kernel-}} or
{{SegmentNodeState}}, i.e.
{{ModifiedNodeState.compareAgainstBaseState(\{Kernel,Segment\}NodeState, ...)}}.
> Many child nodes: Diffing causes many calls to MicroKernel.getNodes
> -------------------------------------------------------------------
>
> Key: OAK-853
> URL: https://issues.apache.org/jira/browse/OAK-853
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: core
> Reporter: Thomas Mueller
> Attachments: OAK-853.patch
>
>
> Creating a flat hierarchy of the following form causes many calls to
> MicroKernel.getNodes and is thus slow.
> {code}
> for (int i = 0; i < 10000; i++) {
> root.addNode("test" + i, "nt:folder");
> if (i % 1000 == 0) {
> session.save();
> }
> }
> {code}
> As far as I see, this isn't just the case for MicroKernel based storage, but
> also for the SegmentNodeStore. The reason seems to be that the optimization
> for many child nodes in KernelNodeState.compareAgainstBaseState and
> SegmentNodeState.compareAgainstBaseState that avoids iterating over all
> children doesn't work.
> The optimization uses:
> {code}
> if (base instanceof SegmentNodeState) ...
> if (base instanceof KernelNodeState) ...
> {code}
> Ideally, the instanceof should be avoided, but I'm not sure how to do that
> yet. Anyway, the problem is that "base" is a ModifiedNodeState so no
> optimization can be used.
> I was thinking, couldn't the ModifiedNodeState do a reverse diff in this
> case? That is, inside ModifiedNodeState.compareAgainstBaseState, check if the
> "base" parameter is a ModifiedNodeState, and the "base" field is not, then do
> a reverse diff, which would be efficient. (We should probably not use "base"
> for both the field name and the parameter; well that's a change for another
> time.)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira