[ 
https://issues.apache.org/jira/browse/OAK-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Sedding updated OAK-11934:
---------------------------------
    Description: 
Particularly for remote segment stores, IO can be a constraining factor. 
Processes like compaction, that traverse the repository, often alternate 
between processing segments and loading segments.

IO could be parallelized by asynchronously preloading segments that are 
referenced by a newly loaded segment into a {{PersistentCache}}. I.e. if the 
"main" thread requests a segment from the cache, and the segment needs to be 
loaded from the persistence, then all segments referenced by the newly loaded 
segment are preloaded, and placed into the cache, asynchronously. When the 
"main" thread loads the next segment, it is likely already in the cache.

Preloading could preload a configurable "depth" of references. Presumably, 
usually a depth of 1 or 2 strikes a good balance between preloading too 
aggressively and efficiently parallelizing IO.

If preloading of references is only performed for newly loaded segments, the 
overhead of the preload mechanism should be minimal to non-existent while only 
cached segments are read.

cc [~miroslav], [~nuno.santos]



  was:
Particularly for remote segment stores, IO can be a constraining factor. 
Processes like compaction, that traverse the repository, often alternate 
between processing segments and loading segments.

IO could be parallelized by enhancing the {{SegmentCache}} to asynchronously 
prefetch segments that are referenced by a newly loaded segment. I.e. if the 
"main" thread requests a segment from the cache, and the segment needs to be 
loaded from the persistence, then all segments referenced by the newly loaded 
segment are prefetched, and placed into the cache, asynchronously. When the 
"main" thread loads the next segment, it is likely already in the cache.

Prefetching could preload a configurable "depth" of references. Presumably, 
usually a depth of 1 or 2 strikes a good balance between preloading too 
aggressively and efficiently parallelizing IO.

If prefetching of references is only performed for newly loaded segments, the 
overhead of the prefetch mechanism should be minimal to non-existent while only 
cached segments are read.

cc [~miroslav], [~nuno.santos]




> segment preloading for PersistentCache
> --------------------------------------
>
>                 Key: OAK-11934
>                 URL: https://issues.apache.org/jira/browse/OAK-11934
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>    Affects Versions: 1.84.0
>            Reporter: Julian Sedding
>            Assignee: Julian Sedding
>            Priority: Major
>
> Particularly for remote segment stores, IO can be a constraining factor. 
> Processes like compaction, that traverse the repository, often alternate 
> between processing segments and loading segments.
> IO could be parallelized by asynchronously preloading segments that are 
> referenced by a newly loaded segment into a {{PersistentCache}}. I.e. if the 
> "main" thread requests a segment from the cache, and the segment needs to be 
> loaded from the persistence, then all segments referenced by the newly loaded 
> segment are preloaded, and placed into the cache, asynchronously. When the 
> "main" thread loads the next segment, it is likely already in the cache.
> Preloading could preload a configurable "depth" of references. Presumably, 
> usually a depth of 1 or 2 strikes a good balance between preloading too 
> aggressively and efficiently parallelizing IO.
> If preloading of references is only performed for newly loaded segments, the 
> overhead of the preload mechanism should be minimal to non-existent while 
> only cached segments are read.
> cc [~miroslav], [~nuno.santos]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to