Chetan Mehrotra created OAK-6339:
------------------------------------
Summary: MapRecord#getKeys should should initialize child
iterables lazily
Key: OAK-6339
URL: https://issues.apache.org/jira/browse/OAK-6339
Project: Jackrabbit Oak
Issue Type: Improvement
Components: segment-tar
Reporter: Chetan Mehrotra
Priority: Minor
Fix For: 1.8
Recently we saw OutOfMemory using
[oakRepoStats|https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/repostats]
script with a SegmentNodeStore setup where uuid index has 16M+ entries and
thus creating a very flat hierarchy. This happened while computing
Tree#getChildren iterator which internally invokes MapRecord#getKeys to obtain
an iterable for child node names.
This happened because code in getKeys computes the key list eagerly by calling
bucket.getKeys() which recursivly calls same for each child bucket and thus
resulting in eager evaluation.
{code}
if (isBranch(size, level)) {
List<MapRecord> buckets = getBucketList(segment);
List<Iterable<String>> keys =
newArrayListWithCapacity(buckets.size());
for (MapRecord bucket : buckets) {
keys.add(bucket.getKeys());
}
return concat(keys);
}
{code}
Instead here we should use same approach as used in MapRecord#getEntries i.e.
evalate the iterable for child buckets lazily
{code}
if (isBranch(size, level)) {
List<MapRecord> buckets = getBucketList(segment);
List<Iterable<MapEntry>> entries =
newArrayListWithCapacity(buckets.size());
for (final MapRecord bucket : buckets) {
entries.add(new Iterable<MapEntry>() {
@Override
public Iterator<MapEntry> iterator() {
return bucket.getEntries(diffKey, diffValue).iterator();
}
});
}
return concat(entries);
}
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)