[ 
https://issues.apache.org/jira/browse/OAK-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18028753#comment-18028753
 ] 

Julian Sedding commented on OAK-11934:
--------------------------------------

I have spent some time experimenting with the approach to add prefetching to 
the in-memory {{SegmentCache}} (which implicitly also fills the persistent 
cache), however, this leads to frequent cache evictions, which is undesirable. 
Also, from within the {{SegmentCache}} it is not possible to probe whether a 
segment is already cached in a persistent cache. Furthermore, without a 
persistent cache, prefetching using this approach can be wasteful, due to 
unwanted cache evictions. It might even lead to a slowdown, as the same segment 
might be loaded repeatedly without ever being used.

I have also looked into the idea described in OAK-11932, which adds prefetching 
to the {{CachingSegmentReader}}. This approach is also problematic, because due 
to the nature of this API, it is impossible to prefetch segments that are in a 
different archive (a different {{CachingSegmentReader}} instance.

The approach that worked best adds the possibility to add a {{PersistentCache}} 
directly to an {{AbstractFileStore}} via the {{FileStoreBuilder}}. This 
{{PersistentCache}} can be decorated internally, by the {{AbstractFileStore}}, 
with a {{SegmentPreloader}}, if preloading is configured. The 
{{SegmentPreloader}} has access to the {{TarFiles}} instance, as well as to the 
{{PersistentCache}}. This allows it
- to probe the cache whether a segment is already present
- load missing segments via {{TarFiles#readSegment}}
- load segment graphs to determine segment references without reading them from 
the segments via {{TarFiles#getGraph}}

> segment prefetching for segmentstore cache
> ------------------------------------------
>
>                 Key: OAK-11934
>                 URL: https://issues.apache.org/jira/browse/OAK-11934
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>    Affects Versions: 1.84.0
>            Reporter: Julian Sedding
>            Assignee: Julian Sedding
>            Priority: Major
>
> Particularly for remote segment stores, IO can be a constraining factor. 
> Processes like compaction, that traverse the repository, often alternate 
> between processing segments and loading segments.
> IO could be parallelized by enhancing the {{SegmentCache}} to asynchronously 
> prefetch segments that are referenced by a newly loaded segment. I.e. if the 
> "main" thread requests a segment from the cache, and the segment needs to be 
> loaded from the persistence, then all segments referenced by the newly loaded 
> segment are prefetched, and placed into the cache, asynchronously. When the 
> "main" thread loads the next segment, it is likely already in the cache.
> Prefetching could preload a configurable "depth" of references. Presumably, 
> usually a depth of 1 or 2 strikes a good balance between preloading too 
> aggressively and efficiently parallelizing IO.
> If prefetching of references is only performed for newly loaded segments, the 
> overhead of the prefetch mechanism should be minimal to non-existent while 
> only cached segments are read.
> cc [~miroslav], [~nuno.santos]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to