[ 
https://issues.apache.org/jira/browse/OAK-12047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nuno Santos resolved OAK-12047.
-------------------------------
    Fix Version/s: 1.92.0
       Resolution: Done

> CachingSegmentArchiveReader#readSegment should check if segment is present in 
> archive before trying to load it from cache
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: OAK-12047
>                 URL: https://issues.apache.org/jira/browse/OAK-12047
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>            Reporter: Nuno Santos
>            Priority: Minor
>             Fix For: 1.92.0
>
>
> A single persistent disk cache instance is shared by all 
> CachingSegmentArchiveReaders of a segment store. This means it will contain 
> segments from all archives. {{CachingSegmentArchiveReaders.readSegment()}}  
> is checking if the segment is in the cache before checking if it is in the 
> archive. ([source 
> code|https://github.com/apache/jackrabbit-oak/blob/784d38ef842b16cb186f190de784c7da9842f607/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/spi/persistence/persistentcache/CachingSegmentArchiveReader.java#L48-L50]).
> {code:java}
> public Buffer readSegment(long msb, long lsb) throws IOException {
>     return persistentCache.readSegment(msb, lsb, () -> 
> delegate.readSegment(msb, lsb));
> } {code}
> If the segment is not in this archive, this behaviour is arguably wrong.
>  * *Segment not in cache* - This method will perform a lookup in the cache 
> and call {{readSegment}} on the delegate, which will return null, as the 
> segment is not in the archive. The current implementation of 
> [TarFiles.readSegment|https://github.com/apache/jackrabbit-oak/blob/784d38ef842b16cb186f190de784c7da9842f607/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/tar/TarFiles.java#L635-L640]
>  iterates over all tar files sequentially, which means that the caching 
> archive reader will query the cache for each archive, even when the segment 
> is not in the archive. These additional lookups in the cache are wasteful and 
> cause noise. They artificially increase the cache miss metric (create noise 
> in the metrics), they add overhead, and they increase the contention on the 
> underlying cache, which can further degrade the performance.
>  * *Segment in cache* - The method will find the segment in the cache and 
> return it, even though the segment does not belong to this archive. This 
> seems to not cause any issue currently, but is a violation of the contract of 
> the method. {{readSegment()}} should return null if the segment is not in the 
> archive. With the current implementation of caching, the method may return 
> null or non-null depending on the contents of the cache. 
> This is a similar issue as OAK-12034.
> The fix is simple: {{CachingSegmentArchiveReaders.readSegment()}} should 
> first check if the segment is in the archive before looking up in the cache. 
> This is a cheap operation as segment archives contain an index of segments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to