[
https://issues.apache.org/jira/browse/OAK-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18028753#comment-18028753
]
Julian Sedding commented on OAK-11934:
--------------------------------------
I have spent some time experimenting with the approach to add prefetching to
the in-memory {{SegmentCache}} (which implicitly also fills the persistent
cache), however, this leads to frequent cache evictions, which is undesirable.
Also, from within the {{SegmentCache}} it is not possible to probe whether a
segment is already cached in a persistent cache. Furthermore, without a
persistent cache, prefetching using this approach can be wasteful, due to
unwanted cache evictions. It might even lead to a slowdown, as the same segment
might be loaded repeatedly without ever being used.
I have also looked into the idea described in OAK-11932, which adds prefetching
to the {{CachingSegmentReader}}. This approach is also problematic, because due
to the nature of this API, it is impossible to prefetch segments that are in a
different archive (a different {{CachingSegmentReader}} instance.
The approach that worked best adds the possibility to add a {{PersistentCache}}
directly to an {{AbstractFileStore}} via the {{FileStoreBuilder}}. This
{{PersistentCache}} can be decorated internally, by the {{AbstractFileStore}},
with a {{SegmentPreloader}}, if preloading is configured. The
{{SegmentPreloader}} has access to the {{TarFiles}} instance, as well as to the
{{PersistentCache}}. This allows it
- to probe the cache whether a segment is already present
- load missing segments via {{TarFiles#readSegment}}
- load segment graphs to determine segment references without reading them from
the segments via {{TarFiles#getGraph}}
> segment prefetching for segmentstore cache
> ------------------------------------------
>
> Key: OAK-11934
> URL: https://issues.apache.org/jira/browse/OAK-11934
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: segment-tar
> Affects Versions: 1.84.0
> Reporter: Julian Sedding
> Assignee: Julian Sedding
> Priority: Major
>
> Particularly for remote segment stores, IO can be a constraining factor.
> Processes like compaction, that traverse the repository, often alternate
> between processing segments and loading segments.
> IO could be parallelized by enhancing the {{SegmentCache}} to asynchronously
> prefetch segments that are referenced by a newly loaded segment. I.e. if the
> "main" thread requests a segment from the cache, and the segment needs to be
> loaded from the persistence, then all segments referenced by the newly loaded
> segment are prefetched, and placed into the cache, asynchronously. When the
> "main" thread loads the next segment, it is likely already in the cache.
> Prefetching could preload a configurable "depth" of references. Presumably,
> usually a depth of 1 or 2 strikes a good balance between preloading too
> aggressively and efficiently parallelizing IO.
> If prefetching of references is only performed for newly loaded segments, the
> overhead of the prefetch mechanism should be minimal to non-existent while
> only cached segments are read.
> cc [~miroslav], [~nuno.santos]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)