[ https://issues.apache.org/jira/browse/OAK-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415730#comment-16415730 ]
Michael Dürig commented on OAK-5655: ------------------------------------ At [http://svn.apache.org/viewvc?rev=1827841&view=rev] I added some utility classes to collect IO traces for specific access patterns. Access patterns are specified via a {{Trace}} instance. Currently the only concrete implementation is {{BreathFirstTrace}}, which traverses the first {{n}} levels of a tree in a breath first manner. IO traces are collected as CSV files: {noformat} timestamp,file,segmentId,length,elapsed,depth,count 1522154516424,data01415a.tar,f81378df-b3f8-4b25-0000-00000002c450,181328,573411,0,1 1522154516441,data01415a.tar,9c2117cb-6eaa-4cf9-0000-00000003ffd0,262096,680192,0,1 1522154516444,data01415a.tar,3fdca869-9272-4b04-0000-00000003ffe0,262112,668914,0,1 .... {noformat} Here depth and count are contributed by the {{BreathFirstTrace}} and record the current depth of the tree and the number of nodes traversed so far. > TarMK: Analyse locality of reference > ------------------------------------- > > Key: OAK-5655 > URL: https://issues.apache.org/jira/browse/OAK-5655 > Project: Jackrabbit Oak > Issue Type: Task > Components: segment-tar > Reporter: Michael Dürig > Priority: Major > Labels: scalability > Fix For: 1.10 > > Attachments: compaction-time-vs-reposize.m, > compaction-time-vs.reposize.png, data00053a.tar-reads.png, offrc.jfr, > segment-per-path-compacted-nocache.png, > segment-per-path-compacted-nostringcache.png, segment-per-path-compacted.png, > segment-per-path.png > > > We need to better understand the locality aspects of content stored in TarMK: > * How is related content spread over segments? > * What content do we consider related? > * How does locality of related content develop over time when changes are > applied? > * What changes do we consider typical? > * What is the impact of compaction on locality? > * What is the impact of the deduplication caches on locality (during normal > operation and during compaction)? > * How good are checkpoints deduplicated? Can we monitor this online? > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005)