[
https://issues.apache.org/jira/browse/OAK-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julian Sedding updated OAK-11934:
---------------------------------
Description:
Particularly for remote segment stores, IO can be a constraining factor.
Processes like compaction, that traverse the repository, often alternate
between processing segments and loading segments.
IO could be parallelized by asynchronously preloading segments that are
referenced by a newly loaded segment into a {{PersistentCache}}. I.e. if the
"main" thread requests a segment from the cache, and the segment needs to be
loaded from the persistence, then all segments referenced by the newly loaded
segment are preloaded, and placed into the cache, asynchronously. When the
"main" thread loads the next segment, it is likely already in the cache.
Preloading could preload a configurable "depth" of references. Presumably,
usually a depth of 1 or 2 strikes a good balance between preloading too
aggressively and efficiently parallelizing IO.
If preloading of references is only performed for newly loaded segments, the
overhead of the preload mechanism should be minimal to non-existent while only
cached segments are read.
cc [~miroslav], [~nuno.santos]
was:
Particularly for remote segment stores, IO can be a constraining factor.
Processes like compaction, that traverse the repository, often alternate
between processing segments and loading segments.
IO could be parallelized by enhancing the {{SegmentCache}} to asynchronously
prefetch segments that are referenced by a newly loaded segment. I.e. if the
"main" thread requests a segment from the cache, and the segment needs to be
loaded from the persistence, then all segments referenced by the newly loaded
segment are prefetched, and placed into the cache, asynchronously. When the
"main" thread loads the next segment, it is likely already in the cache.
Prefetching could preload a configurable "depth" of references. Presumably,
usually a depth of 1 or 2 strikes a good balance between preloading too
aggressively and efficiently parallelizing IO.
If prefetching of references is only performed for newly loaded segments, the
overhead of the prefetch mechanism should be minimal to non-existent while only
cached segments are read.
cc [~miroslav], [~nuno.santos]
> segment preloading for PersistentCache
> --------------------------------------
>
> Key: OAK-11934
> URL: https://issues.apache.org/jira/browse/OAK-11934
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: segment-tar
> Affects Versions: 1.84.0
> Reporter: Julian Sedding
> Assignee: Julian Sedding
> Priority: Major
>
> Particularly for remote segment stores, IO can be a constraining factor.
> Processes like compaction, that traverse the repository, often alternate
> between processing segments and loading segments.
> IO could be parallelized by asynchronously preloading segments that are
> referenced by a newly loaded segment into a {{PersistentCache}}. I.e. if the
> "main" thread requests a segment from the cache, and the segment needs to be
> loaded from the persistence, then all segments referenced by the newly loaded
> segment are preloaded, and placed into the cache, asynchronously. When the
> "main" thread loads the next segment, it is likely already in the cache.
> Preloading could preload a configurable "depth" of references. Presumably,
> usually a depth of 1 or 2 strikes a good balance between preloading too
> aggressively and efficiently parallelizing IO.
> If preloading of references is only performed for newly loaded segments, the
> overhead of the preload mechanism should be minimal to non-existent while
> only cached segments are read.
> cc [~miroslav], [~nuno.santos]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)