Joerg Hoh created OAK-11607:
-------------------------------
Summary: Node.getNodes() not lazy for orderable nodetype
Key: OAK-11607
URL: https://issues.apache.org/jira/browse/OAK-11607
Project: Jackrabbit Oak
Issue Type: Improvement
Components: core
Affects Versions: 1.76.0
Reporter: Joerg Hoh
in AEM we have lot of functionality, which retrieves childnodes, but does not
consume all children of that NodeIterator.
For example we have a function like this:
{noformat}
private boolean hasRelevantChildren(Resource resource) {
for (Iterator<Resource> it = resource.listChildren(); it.hasNext(); ) {
Resource r = it.next();
// don't consider repository nodes (e.g. rep:policy) or content
resources as children
if (r.getName().startsWith("rep:") ||
r.getName().equals(JcrConstants.JCR_CONTENT)
|| r.getName().equals(JcrConstants.JCR_FROZENNODE)) {
continue;
}
return true;
}
return false;
}
{noformat}
which normally just reads a few nodes from the iterator. Now I have found a
good number of occurrences of stacktraces like this:
{noformat}
at
org.apache.jackrabbit.oak.plugins.document.DocumentNodeState$2.iterator(DocumentNodeState.java:368)
at java.lang.Iterable.forEach([email protected]/Iterable.java:74)
at
org.apache.jackrabbit.guava.common.collect.Iterables$5.forEach(Iterables.java:748)
at
org.apache.jackrabbit.guava.common.collect.Iterables$4.forEach(Iterables.java:586)
at
org.apache.jackrabbit.oak.commons.collections.CollectionUtils.toLinkedSet(CollectionUtils.java:139)
at
org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildNames(AbstractTree.java:129)
at
org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildren(AbstractTree.java:312)
at
org.apache.jackrabbit.oak.core.MutableTree.getChildren(MutableTree.java:178)
at
org.apache.jackrabbit.oak.jcr.delegate.NodeDelegate.getChildren(NodeDelegate.java:343)
at
org.apache.jackrabbit.oak.jcr.session.NodeImpl$8.perform(NodeImpl.java:582)
at
org.apache.jackrabbit.oak.jcr.session.NodeImpl$8.perform(NodeImpl.java:578)
at
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:229)
at
org.apache.jackrabbit.oak.jcr.session.ItemImpl.perform(ItemImpl.java:113)
at
org.apache.jackrabbit.oak.jcr.session.NodeImpl.getNodes(NodeImpl.java:578)
at
org.apache.sling.jcr.resource.internal.helper.jcr.JcrNodeResource.listJcrChildren(JcrNodeResource.java:227)
at
org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProvider.listChildren(JcrResourceProvider.java:404)
at
org.apache.sling.resourceresolver.impl.providers.stateful.AuthenticatedResourceProvider.listChildren(AuthenticatedResourceProvider.java:169)
at
org.apache.sling.resourceresolver.impl.helper.ResourceResolverControl.listChildren(ResourceResolverControl.java:297)
at
org.apache.sling.resourceresolver.impl.ResourceResolverImpl.listChildren(ResourceResolverImpl.java:546)
at
org.apache.sling.api.resource.AbstractResource.listChildren(AbstractResource.java:91)
at
org.apache.sling.api.resource.ResourceWrapper.listChildren(ResourceWrapper.java:105)
at ...hasRelevantChildren(....java:279)
{noformat}
Looking at this stacktrace it makes me think, that that Node.getNode() is not
entirely lazy, but at deep in oak.core {{AbstractTree.getChildNames()}} is
called, which reads _all_ child names into a Set. Even if the underlying
DocumentNodeState itself itself returns an iterator and would therefor be lazy.
This means, that for nodetypes with ordered childnodes getting the NodeIterator
is an expensive operation (not even iterating over the iterator) if a lot of
child nodes are present.
We should find a way to optimize this case and not read all all childNames
already when building the Iterator (and therefor get a lazy semantic).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)