[ 
https://issues.apache.org/jira/browse/OAK-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775733#comment-17775733
 ] 

Axel Hanikel commented on OAK-10494:
------------------------------------

{{getRecordIfStored()}} is called roughly 136000 times per replication package 
of 500 paths, so caching 46000 items should be sufficient and 100000 is a safe 
bet.

> Cache backend.getRecord() calls to minimise CloudBlob.downloadAttributes() 
> over the network
> -------------------------------------------------------------------------------------------
>
>                 Key: OAK-10494
>                 URL: https://issues.apache.org/jira/browse/OAK-10494
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: blob-plugins
>            Reporter: Axel Hanikel
>            Priority: Major
>         Attachments: Measurements.md
>
>
> h1. Problem: Metadata are requested more than once for each blob
> Setting a breakpoint in 
> {{AbstractSharedCachingDataStore.getRecordIfStored()}} and logging the 
> dataIdentifiers, we see that it calls {{backend.getRecord()}} 3 times for the 
> same {{dataIdentifier}} when a replication package is installed by vault. The 
> reason seems to be that during commits, every {{CommitHook}} runs its own 
> {{compareAgainstBaseState}} and, because the implementation avoids fetching 
> the blob if it only needs the metadata, the request to the existing blob 
> cache is always a miss.
> h1. Proposed solution: Cache {{backend.getRecord()}} calls
> Manual testing has shown that caching {{backend.getRecord()}} calls reduces 
> the time spent in {{.getRecordIfStored()}} by between 12 and 35% when 
> installing replication packages containing 500 paths.
> The PR is at https://github.com/apache/jackrabbit-oak/pull/1155



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to