[
https://issues.apache.org/jira/browse/OAK-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415754#comment-16415754
]
Michael Dürig commented on OAK-5655:
------------------------------------
The graph bellow was generated from data collected with the {{IOTracer}} I just
committed.
!segment-reads.png|width=500!
The plots show the number of segments read vs. the cache size (in MB) when
traversing the first 5 levels of a real world repository. There is one plot
each for memory mapped/file access and compacted (17GB) / uncompacted (6.5GB).
* Comparing memory mapped to file access the plots are quite similar but
off-setted differently on the x Axis. This is to be expected because the number
of segments the segment cache can hold differs by a (roughly) fixed ratio.
* Comparing compacted to uncompacted shows a big difference in terms of
numbers of reads: in the compacted case total number of reads is much lower and
stable down to relative low cache size, at which point number of segment reads
sharply increase. In the uncompacted case number of reads steadily increase
when lowering the cache size.
> TarMK: Analyse locality of reference
> -------------------------------------
>
> Key: OAK-5655
> URL: https://issues.apache.org/jira/browse/OAK-5655
> Project: Jackrabbit Oak
> Issue Type: Task
> Components: segment-tar
> Reporter: Michael Dürig
> Priority: Major
> Labels: scalability
> Fix For: 1.10
>
> Attachments: compaction-time-vs-reposize.m,
> compaction-time-vs.reposize.png, data00053a.tar-reads.png, offrc.jfr,
> segment-per-path-compacted-nocache.png,
> segment-per-path-compacted-nostringcache.png, segment-per-path-compacted.png,
> segment-per-path.png, segment-reads.png
>
>
> We need to better understand the locality aspects of content stored in TarMK:
> * How is related content spread over segments?
> * What content do we consider related?
> * How does locality of related content develop over time when changes are
> applied?
> * What changes do we consider typical?
> * What is the impact of compaction on locality?
> * What is the impact of the deduplication caches on locality (during normal
> operation and during compaction)?
> * How good are checkpoints deduplicated? Can we monitor this online?
> * ...
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)