[
https://issues.apache.org/jira/browse/OAK-12047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nuno Santos resolved OAK-12047.
-------------------------------
Fix Version/s: 1.92.0
Resolution: Done
> CachingSegmentArchiveReader#readSegment should check if segment is present in
> archive before trying to load it from cache
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: OAK-12047
> URL: https://issues.apache.org/jira/browse/OAK-12047
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: segment-tar
> Reporter: Nuno Santos
> Priority: Minor
> Fix For: 1.92.0
>
>
> A single persistent disk cache instance is shared by all
> CachingSegmentArchiveReaders of a segment store. This means it will contain
> segments from all archives. {{CachingSegmentArchiveReaders.readSegment()}}
> is checking if the segment is in the cache before checking if it is in the
> archive. ([source
> code|https://github.com/apache/jackrabbit-oak/blob/784d38ef842b16cb186f190de784c7da9842f607/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/spi/persistence/persistentcache/CachingSegmentArchiveReader.java#L48-L50]).
> {code:java}
> public Buffer readSegment(long msb, long lsb) throws IOException {
> return persistentCache.readSegment(msb, lsb, () ->
> delegate.readSegment(msb, lsb));
> } {code}
> If the segment is not in this archive, this behaviour is arguably wrong.
> * *Segment not in cache* - This method will perform a lookup in the cache
> and call {{readSegment}} on the delegate, which will return null, as the
> segment is not in the archive. The current implementation of
> [TarFiles.readSegment|https://github.com/apache/jackrabbit-oak/blob/784d38ef842b16cb186f190de784c7da9842f607/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/tar/TarFiles.java#L635-L640]
> iterates over all tar files sequentially, which means that the caching
> archive reader will query the cache for each archive, even when the segment
> is not in the archive. These additional lookups in the cache are wasteful and
> cause noise. They artificially increase the cache miss metric (create noise
> in the metrics), they add overhead, and they increase the contention on the
> underlying cache, which can further degrade the performance.
> * *Segment in cache* - The method will find the segment in the cache and
> return it, even though the segment does not belong to this archive. This
> seems to not cause any issue currently, but is a violation of the contract of
> the method. {{readSegment()}} should return null if the segment is not in the
> archive. With the current implementation of caching, the method may return
> null or non-null depending on the contents of the cache.
> This is a similar issue as OAK-12034.
> The fix is simple: {{CachingSegmentArchiveReaders.readSegment()}} should
> first check if the segment is in the archive before looking up in the cache.
> This is a cheap operation as segment archives contain an index of segments.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)