[
https://issues.apache.org/jira/browse/OAK-11607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joerg Hoh updated OAK-11607:
----------------------------
Description:
in AEM we have lot of functionality, which retrieves childnodes, but does not
consume all children of that NodeIterator.
For example we have a function like this:
{noformat}
private boolean hasRelevantChildren(Resource resource) {
for (Iterator<Resource> it = resource.listChildren(); it.hasNext(); ) {
Resource r = it.next();
// don't consider repository nodes (e.g. rep:policy) or content
resources as children
if (r.getName().startsWith("rep:") ||
r.getName().equals(JcrConstants.JCR_CONTENT)
|| r.getName().equals(JcrConstants.JCR_FROZENNODE)) {
continue;
}
return true;
}
return false;
}
{noformat}
which normally just reads a few nodes from the iterator. Now I have found a
good number of occurrences of stacktraces like this:
{noformat}
at
org.apache.jackrabbit.oak.plugins.document.DocumentNodeState$2.iterator(DocumentNodeState.java:368)
at java.lang.Iterable.forEach([email protected]/Iterable.java:74)
at
org.apache.jackrabbit.guava.common.collect.Iterables$5.forEach(Iterables.java:748)
at
org.apache.jackrabbit.guava.common.collect.Iterables$4.forEach(Iterables.java:586)
at
org.apache.jackrabbit.oak.commons.collections.CollectionUtils.toLinkedSet(CollectionUtils.java:139)
at
org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildNames(AbstractTree.java:129)
at
org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildren(AbstractTree.java:312)
at
org.apache.jackrabbit.oak.core.MutableTree.getChildren(MutableTree.java:178)
at
org.apache.jackrabbit.oak.jcr.delegate.NodeDelegate.getChildren(NodeDelegate.java:343)
at
org.apache.jackrabbit.oak.jcr.session.NodeImpl$8.perform(NodeImpl.java:582)
at
org.apache.jackrabbit.oak.jcr.session.NodeImpl$8.perform(NodeImpl.java:578)
at
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:229)
at
org.apache.jackrabbit.oak.jcr.session.ItemImpl.perform(ItemImpl.java:113)
at
org.apache.jackrabbit.oak.jcr.session.NodeImpl.getNodes(NodeImpl.java:578)
at
org.apache.sling.jcr.resource.internal.helper.jcr.JcrNodeResource.listJcrChildren(JcrNodeResource.java:227)
at
org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProvider.listChildren(JcrResourceProvider.java:404)
at
org.apache.sling.resourceresolver.impl.providers.stateful.AuthenticatedResourceProvider.listChildren(AuthenticatedResourceProvider.java:169)
at
org.apache.sling.resourceresolver.impl.helper.ResourceResolverControl.listChildren(ResourceResolverControl.java:297)
at
org.apache.sling.resourceresolver.impl.ResourceResolverImpl.listChildren(ResourceResolverImpl.java:546)
at
org.apache.sling.api.resource.AbstractResource.listChildren(AbstractResource.java:91)
at
org.apache.sling.api.resource.ResourceWrapper.listChildren(ResourceWrapper.java:105)
at ...hasRelevantChildren(....java:279)
{noformat}
Looking at this stacktrace it makes me think, that that Node.getNode() is not
entirely lazy, but at deep in oak.core {{AbstractTree.getChildNames()}} is
called, which reads _all_ child names into a Set. Even if the underlying
DocumentNodeState itself itself returns an iterator and would therefor be lazy.
This means, that for nodetypes with ordered childnodes getting the NodeIterator
is an expensive and slow operation (not even iterating over the iterator) if a
lot of child nodes are present.
We should find a way to optimize this case and not read all all childNames
already when building the Iterator (and therefor get a lazy semantic).
was:
in AEM we have lot of functionality, which retrieves childnodes, but does not
consume all children of that NodeIterator.
For example we have a function like this:
{noformat}
private boolean hasRelevantChildren(Resource resource) {
for (Iterator<Resource> it = resource.listChildren(); it.hasNext(); ) {
Resource r = it.next();
// don't consider repository nodes (e.g. rep:policy) or content
resources as children
if (r.getName().startsWith("rep:") ||
r.getName().equals(JcrConstants.JCR_CONTENT)
|| r.getName().equals(JcrConstants.JCR_FROZENNODE)) {
continue;
}
return true;
}
return false;
}
{noformat}
which normally just reads a few nodes from the iterator. Now I have found a
good number of occurrences of stacktraces like this:
{noformat}
at
org.apache.jackrabbit.oak.plugins.document.DocumentNodeState$2.iterator(DocumentNodeState.java:368)
at java.lang.Iterable.forEach([email protected]/Iterable.java:74)
at
org.apache.jackrabbit.guava.common.collect.Iterables$5.forEach(Iterables.java:748)
at
org.apache.jackrabbit.guava.common.collect.Iterables$4.forEach(Iterables.java:586)
at
org.apache.jackrabbit.oak.commons.collections.CollectionUtils.toLinkedSet(CollectionUtils.java:139)
at
org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildNames(AbstractTree.java:129)
at
org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildren(AbstractTree.java:312)
at
org.apache.jackrabbit.oak.core.MutableTree.getChildren(MutableTree.java:178)
at
org.apache.jackrabbit.oak.jcr.delegate.NodeDelegate.getChildren(NodeDelegate.java:343)
at
org.apache.jackrabbit.oak.jcr.session.NodeImpl$8.perform(NodeImpl.java:582)
at
org.apache.jackrabbit.oak.jcr.session.NodeImpl$8.perform(NodeImpl.java:578)
at
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:229)
at
org.apache.jackrabbit.oak.jcr.session.ItemImpl.perform(ItemImpl.java:113)
at
org.apache.jackrabbit.oak.jcr.session.NodeImpl.getNodes(NodeImpl.java:578)
at
org.apache.sling.jcr.resource.internal.helper.jcr.JcrNodeResource.listJcrChildren(JcrNodeResource.java:227)
at
org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProvider.listChildren(JcrResourceProvider.java:404)
at
org.apache.sling.resourceresolver.impl.providers.stateful.AuthenticatedResourceProvider.listChildren(AuthenticatedResourceProvider.java:169)
at
org.apache.sling.resourceresolver.impl.helper.ResourceResolverControl.listChildren(ResourceResolverControl.java:297)
at
org.apache.sling.resourceresolver.impl.ResourceResolverImpl.listChildren(ResourceResolverImpl.java:546)
at
org.apache.sling.api.resource.AbstractResource.listChildren(AbstractResource.java:91)
at
org.apache.sling.api.resource.ResourceWrapper.listChildren(ResourceWrapper.java:105)
at ...hasRelevantChildren(....java:279)
{noformat}
Looking at this stacktrace it makes me think, that that Node.getNode() is not
entirely lazy, but at deep in oak.core {{AbstractTree.getChildNames()}} is
called, which reads _all_ child names into a Set. Even if the underlying
DocumentNodeState itself itself returns an iterator and would therefor be lazy.
This means, that for nodetypes with ordered childnodes getting the NodeIterator
is an expensive operation (not even iterating over the iterator) if a lot of
child nodes are present.
We should find a way to optimize this case and not read all all childNames
already when building the Iterator (and therefor get a lazy semantic).
> Node.getNodes() not lazy for orderable nodetype
> -----------------------------------------------
>
> Key: OAK-11607
> URL: https://issues.apache.org/jira/browse/OAK-11607
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.76.0
> Reporter: Joerg Hoh
> Priority: Major
>
> in AEM we have lot of functionality, which retrieves childnodes, but does not
> consume all children of that NodeIterator.
> For example we have a function like this:
> {noformat}
> private boolean hasRelevantChildren(Resource resource) {
> for (Iterator<Resource> it = resource.listChildren(); it.hasNext(); )
> {
> Resource r = it.next();
> // don't consider repository nodes (e.g. rep:policy) or content
> resources as children
> if (r.getName().startsWith("rep:") ||
> r.getName().equals(JcrConstants.JCR_CONTENT)
> || r.getName().equals(JcrConstants.JCR_FROZENNODE)) {
> continue;
> }
> return true;
> }
> return false;
> }
> {noformat}
> which normally just reads a few nodes from the iterator. Now I have found a
> good number of occurrences of stacktraces like this:
> {noformat}
> at
> org.apache.jackrabbit.oak.plugins.document.DocumentNodeState$2.iterator(DocumentNodeState.java:368)
> at java.lang.Iterable.forEach([email protected]/Iterable.java:74)
> at
> org.apache.jackrabbit.guava.common.collect.Iterables$5.forEach(Iterables.java:748)
> at
> org.apache.jackrabbit.guava.common.collect.Iterables$4.forEach(Iterables.java:586)
> at
> org.apache.jackrabbit.oak.commons.collections.CollectionUtils.toLinkedSet(CollectionUtils.java:139)
> at
> org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildNames(AbstractTree.java:129)
> at
> org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildren(AbstractTree.java:312)
> at
> org.apache.jackrabbit.oak.core.MutableTree.getChildren(MutableTree.java:178)
> at
> org.apache.jackrabbit.oak.jcr.delegate.NodeDelegate.getChildren(NodeDelegate.java:343)
> at
> org.apache.jackrabbit.oak.jcr.session.NodeImpl$8.perform(NodeImpl.java:582)
> at
> org.apache.jackrabbit.oak.jcr.session.NodeImpl$8.perform(NodeImpl.java:578)
> at
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:229)
> at
> org.apache.jackrabbit.oak.jcr.session.ItemImpl.perform(ItemImpl.java:113)
> at
> org.apache.jackrabbit.oak.jcr.session.NodeImpl.getNodes(NodeImpl.java:578)
> at
> org.apache.sling.jcr.resource.internal.helper.jcr.JcrNodeResource.listJcrChildren(JcrNodeResource.java:227)
> at
> org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProvider.listChildren(JcrResourceProvider.java:404)
> at
> org.apache.sling.resourceresolver.impl.providers.stateful.AuthenticatedResourceProvider.listChildren(AuthenticatedResourceProvider.java:169)
> at
> org.apache.sling.resourceresolver.impl.helper.ResourceResolverControl.listChildren(ResourceResolverControl.java:297)
> at
> org.apache.sling.resourceresolver.impl.ResourceResolverImpl.listChildren(ResourceResolverImpl.java:546)
> at
> org.apache.sling.api.resource.AbstractResource.listChildren(AbstractResource.java:91)
> at
> org.apache.sling.api.resource.ResourceWrapper.listChildren(ResourceWrapper.java:105)
> at ...hasRelevantChildren(....java:279)
> {noformat}
> Looking at this stacktrace it makes me think, that that Node.getNode() is not
> entirely lazy, but at deep in oak.core {{AbstractTree.getChildNames()}} is
> called, which reads _all_ child names into a Set. Even if the underlying
> DocumentNodeState itself itself returns an iterator and would therefor be
> lazy.
> This means, that for nodetypes with ordered childnodes getting the
> NodeIterator is an expensive and slow operation (not even iterating over the
> iterator) if a lot of child nodes are present.
> We should find a way to optimize this case and not read all all childNames
> already when building the Iterator (and therefor get a lazy semantic).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)