[ 
https://issues.apache.org/jira/browse/OAK-11607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Hoh updated OAK-11607:
----------------------------
    Description: 
in AEM we have lot of functionality, which retrieves childnodes, but does not 
consume all children of that NodeIterator.

For example we have a function like this:

{noformat}
    private boolean hasRelevantChildren(Resource resource) {
        for (Iterator<Resource> it = resource.listChildren(); it.hasNext(); ) {
            Resource r = it.next();

            // don't consider repository nodes (e.g. rep:policy) or content 
resources as children 
            if (r.getName().startsWith("rep:") || 
r.getName().equals(JcrConstants.JCR_CONTENT)
                    || r.getName().equals(JcrConstants.JCR_FROZENNODE)) {
                continue;
            }
            return true;
        }

        return false;
    }
{noformat}

which normally just reads a few nodes from the iterator. Now I have found a 
good number of occurrences of stacktraces like this:

{noformat}
        at 
org.apache.jackrabbit.oak.plugins.document.DocumentNodeState$2.iterator(DocumentNodeState.java:368)
        at java.lang.Iterable.forEach([email protected]/Iterable.java:74)
        at 
org.apache.jackrabbit.guava.common.collect.Iterables$5.forEach(Iterables.java:748)
        at 
org.apache.jackrabbit.guava.common.collect.Iterables$4.forEach(Iterables.java:586)
        at 
org.apache.jackrabbit.oak.commons.collections.CollectionUtils.toLinkedSet(CollectionUtils.java:139)
        at 
org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildNames(AbstractTree.java:129)
        at 
org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildren(AbstractTree.java:312)
        at 
org.apache.jackrabbit.oak.core.MutableTree.getChildren(MutableTree.java:178)
        at 
org.apache.jackrabbit.oak.jcr.delegate.NodeDelegate.getChildren(NodeDelegate.java:343)
        at 
org.apache.jackrabbit.oak.jcr.session.NodeImpl$8.perform(NodeImpl.java:582)
        at 
org.apache.jackrabbit.oak.jcr.session.NodeImpl$8.perform(NodeImpl.java:578)
        at 
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:229)
        at 
org.apache.jackrabbit.oak.jcr.session.ItemImpl.perform(ItemImpl.java:113)
        at 
org.apache.jackrabbit.oak.jcr.session.NodeImpl.getNodes(NodeImpl.java:578)
        at 
org.apache.sling.jcr.resource.internal.helper.jcr.JcrNodeResource.listJcrChildren(JcrNodeResource.java:227)
        at 
org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProvider.listChildren(JcrResourceProvider.java:404)
        at 
org.apache.sling.resourceresolver.impl.providers.stateful.AuthenticatedResourceProvider.listChildren(AuthenticatedResourceProvider.java:169)
        at 
org.apache.sling.resourceresolver.impl.helper.ResourceResolverControl.listChildren(ResourceResolverControl.java:297)
        at 
org.apache.sling.resourceresolver.impl.ResourceResolverImpl.listChildren(ResourceResolverImpl.java:546)
        at 
org.apache.sling.api.resource.AbstractResource.listChildren(AbstractResource.java:91)
        at 
org.apache.sling.api.resource.ResourceWrapper.listChildren(ResourceWrapper.java:105)
        at ...hasRelevantChildren(....java:279)
{noformat}

Looking at this stacktrace it makes me think, that that Node.getNode() is not 
entirely lazy, but at deep in oak.core {{AbstractTree.getChildNames()}} is 
called, which reads _all_ child names into a Set. Even if the underlying 
DocumentNodeState itself itself returns an iterator and would therefor be lazy.

This means, that for nodetypes with ordered childnodes getting the NodeIterator 
is an expensive and slow operation (not even iterating over the iterator) if a 
lot of child nodes are present.

We should find a way to optimize this case and not read all all childNames 
already when building the Iterator (and therefor get a lazy semantic).








  was:
in AEM we have lot of functionality, which retrieves childnodes, but does not 
consume all children of that NodeIterator.

For example we have a function like this:

{noformat}
    private boolean hasRelevantChildren(Resource resource) {
        for (Iterator<Resource> it = resource.listChildren(); it.hasNext(); ) {
            Resource r = it.next();

            // don't consider repository nodes (e.g. rep:policy) or content 
resources as children 
            if (r.getName().startsWith("rep:") || 
r.getName().equals(JcrConstants.JCR_CONTENT)
                    || r.getName().equals(JcrConstants.JCR_FROZENNODE)) {
                continue;
            }
            return true;
        }

        return false;
    }
{noformat}

which normally just reads a few nodes from the iterator. Now I have found a 
good number of occurrences of stacktraces like this:

{noformat}
        at 
org.apache.jackrabbit.oak.plugins.document.DocumentNodeState$2.iterator(DocumentNodeState.java:368)
        at java.lang.Iterable.forEach([email protected]/Iterable.java:74)
        at 
org.apache.jackrabbit.guava.common.collect.Iterables$5.forEach(Iterables.java:748)
        at 
org.apache.jackrabbit.guava.common.collect.Iterables$4.forEach(Iterables.java:586)
        at 
org.apache.jackrabbit.oak.commons.collections.CollectionUtils.toLinkedSet(CollectionUtils.java:139)
        at 
org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildNames(AbstractTree.java:129)
        at 
org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildren(AbstractTree.java:312)
        at 
org.apache.jackrabbit.oak.core.MutableTree.getChildren(MutableTree.java:178)
        at 
org.apache.jackrabbit.oak.jcr.delegate.NodeDelegate.getChildren(NodeDelegate.java:343)
        at 
org.apache.jackrabbit.oak.jcr.session.NodeImpl$8.perform(NodeImpl.java:582)
        at 
org.apache.jackrabbit.oak.jcr.session.NodeImpl$8.perform(NodeImpl.java:578)
        at 
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:229)
        at 
org.apache.jackrabbit.oak.jcr.session.ItemImpl.perform(ItemImpl.java:113)
        at 
org.apache.jackrabbit.oak.jcr.session.NodeImpl.getNodes(NodeImpl.java:578)
        at 
org.apache.sling.jcr.resource.internal.helper.jcr.JcrNodeResource.listJcrChildren(JcrNodeResource.java:227)
        at 
org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProvider.listChildren(JcrResourceProvider.java:404)
        at 
org.apache.sling.resourceresolver.impl.providers.stateful.AuthenticatedResourceProvider.listChildren(AuthenticatedResourceProvider.java:169)
        at 
org.apache.sling.resourceresolver.impl.helper.ResourceResolverControl.listChildren(ResourceResolverControl.java:297)
        at 
org.apache.sling.resourceresolver.impl.ResourceResolverImpl.listChildren(ResourceResolverImpl.java:546)
        at 
org.apache.sling.api.resource.AbstractResource.listChildren(AbstractResource.java:91)
        at 
org.apache.sling.api.resource.ResourceWrapper.listChildren(ResourceWrapper.java:105)
        at ...hasRelevantChildren(....java:279)
{noformat}

Looking at this stacktrace it makes me think, that that Node.getNode() is not 
entirely lazy, but at deep in oak.core {{AbstractTree.getChildNames()}} is 
called, which reads _all_ child names into a Set. Even if the underlying 
DocumentNodeState itself itself returns an iterator and would therefor be lazy.

This means, that for nodetypes with ordered childnodes getting the NodeIterator 
is an expensive operation (not even iterating over the iterator) if a lot of 
child nodes are present.

We should find a way to optimize this case and not read all all childNames 
already when building the Iterator (and therefor get a lazy semantic).









> Node.getNodes() not lazy for orderable nodetype
> -----------------------------------------------
>
>                 Key: OAK-11607
>                 URL: https://issues.apache.org/jira/browse/OAK-11607
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.76.0
>            Reporter: Joerg Hoh
>            Priority: Major
>
> in AEM we have lot of functionality, which retrieves childnodes, but does not 
> consume all children of that NodeIterator.
> For example we have a function like this:
> {noformat}
>     private boolean hasRelevantChildren(Resource resource) {
>         for (Iterator<Resource> it = resource.listChildren(); it.hasNext(); ) 
> {
>             Resource r = it.next();
>             // don't consider repository nodes (e.g. rep:policy) or content 
> resources as children 
>             if (r.getName().startsWith("rep:") || 
> r.getName().equals(JcrConstants.JCR_CONTENT)
>                     || r.getName().equals(JcrConstants.JCR_FROZENNODE)) {
>                 continue;
>             }
>             return true;
>         }
>         return false;
>     }
> {noformat}
> which normally just reads a few nodes from the iterator. Now I have found a 
> good number of occurrences of stacktraces like this:
> {noformat}
>         at 
> org.apache.jackrabbit.oak.plugins.document.DocumentNodeState$2.iterator(DocumentNodeState.java:368)
>         at java.lang.Iterable.forEach([email protected]/Iterable.java:74)
>         at 
> org.apache.jackrabbit.guava.common.collect.Iterables$5.forEach(Iterables.java:748)
>         at 
> org.apache.jackrabbit.guava.common.collect.Iterables$4.forEach(Iterables.java:586)
>         at 
> org.apache.jackrabbit.oak.commons.collections.CollectionUtils.toLinkedSet(CollectionUtils.java:139)
>         at 
> org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildNames(AbstractTree.java:129)
>         at 
> org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildren(AbstractTree.java:312)
>         at 
> org.apache.jackrabbit.oak.core.MutableTree.getChildren(MutableTree.java:178)
>         at 
> org.apache.jackrabbit.oak.jcr.delegate.NodeDelegate.getChildren(NodeDelegate.java:343)
>         at 
> org.apache.jackrabbit.oak.jcr.session.NodeImpl$8.perform(NodeImpl.java:582)
>         at 
> org.apache.jackrabbit.oak.jcr.session.NodeImpl$8.perform(NodeImpl.java:578)
>         at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:229)
>         at 
> org.apache.jackrabbit.oak.jcr.session.ItemImpl.perform(ItemImpl.java:113)
>         at 
> org.apache.jackrabbit.oak.jcr.session.NodeImpl.getNodes(NodeImpl.java:578)
>         at 
> org.apache.sling.jcr.resource.internal.helper.jcr.JcrNodeResource.listJcrChildren(JcrNodeResource.java:227)
>         at 
> org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProvider.listChildren(JcrResourceProvider.java:404)
>         at 
> org.apache.sling.resourceresolver.impl.providers.stateful.AuthenticatedResourceProvider.listChildren(AuthenticatedResourceProvider.java:169)
>         at 
> org.apache.sling.resourceresolver.impl.helper.ResourceResolverControl.listChildren(ResourceResolverControl.java:297)
>         at 
> org.apache.sling.resourceresolver.impl.ResourceResolverImpl.listChildren(ResourceResolverImpl.java:546)
>         at 
> org.apache.sling.api.resource.AbstractResource.listChildren(AbstractResource.java:91)
>         at 
> org.apache.sling.api.resource.ResourceWrapper.listChildren(ResourceWrapper.java:105)
>         at ...hasRelevantChildren(....java:279)
> {noformat}
> Looking at this stacktrace it makes me think, that that Node.getNode() is not 
> entirely lazy, but at deep in oak.core {{AbstractTree.getChildNames()}} is 
> called, which reads _all_ child names into a Set. Even if the underlying 
> DocumentNodeState itself itself returns an iterator and would therefor be 
> lazy.
> This means, that for nodetypes with ordered childnodes getting the 
> NodeIterator is an expensive and slow operation (not even iterating over the 
> iterator) if a lot of child nodes are present.
> We should find a way to optimize this case and not read all all childNames 
> already when building the Iterator (and therefor get a lazy semantic).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to