[
https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909545#comment-16909545
]
Matt Ryan commented on OAK-8552:
--------------------------------
The entry point for getting a direct download URI begins with a {{Binary}}
instance and the {{getURI()}} call.
Known causes of network requests in this call:
* Starting at
[https://github.com/apache/jackrabbit-oak/blob/22c3be68e4bc7fdf811ab0fbb2471f2d026508e7/oak-store-spi/src/main/java/org/apache/jackrabbit/oak/plugins/value/jcr/BinaryImpl.java#L96]
- the call to {{getReference()}} calls through the blob implementation into
{{DataStoreBlobStore#getReference()}} which calls
{{AbstractSharedCachingDataStore#getRecordIfStored()}}. If the blob is not
cached this will result in a call to the backend's {{getRecord()}}. For
{{AzureBlobStoreBackend}}, for example, this actually currently makes two
network calls - one to check if the blob exists, and another to get the blob
metadata needed to construct the {{DataRecord}}. (See
[https://github.com/apache/jackrabbit-oak/blob/22c3be68e4bc7fdf811ab0fbb2471f2d026508e7/oak-blob-cloud-azure/src/main/java/org/apache/jackrabbit/oak/blob/cloud/azure/blobstorage/AzureBlobStoreBackend.java#L355).]
But all that is really needed in this case is the reference, which can be
obtained from the back end directly using the blob id - no network calls
required. Furthermore, the reason we are even trying to get the reference in
the first place is to determine if this blob is stored inline or not. Maybe
there is a better way to determine this.
* Starting at
[https://github.com/apache/jackrabbit-oak/blob/22c3be68e4bc7fdf811ab0fbb2471f2d026508e7/oak-store-spi/src/main/java/org/apache/jackrabbit/oak/plugins/value/jcr/BinaryImpl.java#L107]
- the call to {{getDownloadURI()}} eventually results in a call to the data
store implementation's {{getDownloadURI()}} method. In the case of
{{AzureDataStore}}, this calls into the backend's {{createHttpDownloadURI()}}
method which (now, due to OAK-7998) is checking that the binary exists - a
network call - before creating the signed download URI. Note that creating the
download URI doesn't require the network call, but checking for the existence
of the blob ID does.
In a benchmark test I showed that creating 1000 download URIs took just over
40000 milliseconds, averaging around 40 milliseconds per request. This result
is actually not that bad - but removing the existence check and running the
test again dropped the time to 147 milliseconds for all 1000 URIs. So we can
see that if the network latency is bad this could potentially be a problem.
> Minimize network calls required when creating a direct download URI
> -------------------------------------------------------------------
>
> Key: OAK-8552
> URL: https://issues.apache.org/jira/browse/OAK-8552
> Project: Jackrabbit Oak
> Issue Type: Sub-task
> Components: blob-cloud, blob-cloud-azure
> Reporter: Matt Ryan
> Priority: Major
>
> We need to isolate and try to optimize network calls required to create a
> direct download URI.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)