[ 
https://issues.apache.org/jira/browse/OAK-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415754#comment-16415754
 ] 

Michael Dürig commented on OAK-5655:
------------------------------------

The graph bellow was generated from data collected with the {{IOTracer}} I just 
committed.

!segment-reads.png|width=500!

The plots show the number of segments read vs. the cache size (in MB) when 
traversing the first 5 levels of a real world repository. There is one plot 
each for memory mapped/file access and compacted (17GB) / uncompacted (6.5GB).
 * Comparing memory mapped to file access the plots are quite similar but 
off-setted differently on the x Axis. This is to be expected because the number 
of segments the segment cache can hold differs by a (roughly) fixed ratio. 
 * Comparing compacted to uncompacted shows a big difference in terms of 
numbers of reads: in the compacted case total number of reads is much lower and 
stable down to relative low cache size, at which point number of segment reads 
sharply increase. In the uncompacted case number of reads steadily increase 
when lowering the cache size.

 

> TarMK: Analyse locality of reference 
> -------------------------------------
>
>                 Key: OAK-5655
>                 URL: https://issues.apache.org/jira/browse/OAK-5655
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Priority: Major
>              Labels: scalability
>             Fix For: 1.10
>
>         Attachments: compaction-time-vs-reposize.m, 
> compaction-time-vs.reposize.png, data00053a.tar-reads.png, offrc.jfr, 
> segment-per-path-compacted-nocache.png, 
> segment-per-path-compacted-nostringcache.png, segment-per-path-compacted.png, 
> segment-per-path.png, segment-reads.png
>
>
> We need to better understand the locality aspects of content stored in TarMK: 
> * How is related content spread over segments?
> * What content do we consider related? 
> * How does locality of related content develop over time when changes are 
> applied?
> * What changes do we consider typical?
> * What is the impact of compaction on locality? 
> * What is the impact of the deduplication caches on locality (during normal 
> operation and during compaction)?
> * How good are checkpoints deduplicated? Can we monitor this online?
> * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to