[ https://issues.apache.org/jira/browse/OAK-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775733#comment-17775733 ]
Axel Hanikel commented on OAK-10494: ------------------------------------ {{getRecordIfStored()}} is called roughly 136000 times per replication package of 500 paths, so caching 46000 items should be sufficient and 100000 is a safe bet. > Cache backend.getRecord() calls to minimise CloudBlob.downloadAttributes() > over the network > ------------------------------------------------------------------------------------------- > > Key: OAK-10494 > URL: https://issues.apache.org/jira/browse/OAK-10494 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob-plugins > Reporter: Axel Hanikel > Priority: Major > Attachments: Measurements.md > > > h1. Problem: Metadata are requested more than once for each blob > Setting a breakpoint in > {{AbstractSharedCachingDataStore.getRecordIfStored()}} and logging the > dataIdentifiers, we see that it calls {{backend.getRecord()}} 3 times for the > same {{dataIdentifier}} when a replication package is installed by vault. The > reason seems to be that during commits, every {{CommitHook}} runs its own > {{compareAgainstBaseState}} and, because the implementation avoids fetching > the blob if it only needs the metadata, the request to the existing blob > cache is always a miss. > h1. Proposed solution: Cache {{backend.getRecord()}} calls > Manual testing has shown that caching {{backend.getRecord()}} calls reduces > the time spent in {{.getRecordIfStored()}} by between 12 and 35% when > installing replication packages containing 500 paths. > The PR is at https://github.com/apache/jackrabbit-oak/pull/1155 -- This message was sent by Atlassian Jira (v8.20.10#820010)