[
https://issues.apache.org/jira/browse/OAK-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Francesco Mari updated OAK-2896:
--------------------------------
Fix Version/s: (was: Segment Tar 0.0.4)
Segment Tar 0.0.6
> Putting many elements into a map results in many small segments.
> -----------------------------------------------------------------
>
> Key: OAK-2896
> URL: https://issues.apache.org/jira/browse/OAK-2896
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: segment-tar
> Reporter: Michael Dürig
> Assignee: Michael Dürig
> Priority: Critical
> Labels: performance
> Fix For: 1.6, Segment Tar 0.0.6
>
> Attachments: OAK-2896.png, OAK-2896.xlsx, size-dist.png
>
>
> There is an issue with how the HAMT implementation
> ({{SegmentWriter.writeMap()}} interacts with the 256 segment references limit
> when putting many entries into the map: This limit gets regularly reached
> once the maps contains about 200k entries. At that points segments get
> prematurely flushed resulting in more segments, thus more references and thus
> even smaller segments. It is common for segments to be as small as 7k with a
> tar file containing up to 35k segments. This is problematic as at this point
> handling of the segment graph becomes expensive, both memory and CPU wise. I
> have seen persisted segment graphs as big as 35M where the usual size is a
> couple of ks.
> As the HAMT map is used for storing children of a node this might have an
> advert effect on nodes with many child nodes.
> The following code can be used to reproduce the issue:
> {code}
> SegmentWriter writer = new SegmentWriter(segmentStore, getTracker(), V_11);
> MapRecord baseMap = null;
> for (;;) {
> Map<String, RecordId> map = newHashMap();
> for (int k = 0; k < 1000; k++) {
> RecordId stringId =
> writer.writeString(String.valueOf(rnd.nextLong()));
> map.put(String.valueOf(rnd.nextLong()), stringId);
> }
> Stopwatch w = Stopwatch.createStarted();
> baseMap = writer.writeMap(baseMap, map);
> System.out.println(baseMap.size() + " " + w.elapsed());
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)