[ 
https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909545#comment-16909545
 ] 

Matt Ryan commented on OAK-8552:
--------------------------------

The entry point for getting a direct download URI begins with a {{Binary}} 
instance and the {{getURI()}} call.

Known causes of network requests in this call:
 * Starting at 
[https://github.com/apache/jackrabbit-oak/blob/22c3be68e4bc7fdf811ab0fbb2471f2d026508e7/oak-store-spi/src/main/java/org/apache/jackrabbit/oak/plugins/value/jcr/BinaryImpl.java#L96]
 - the call to {{getReference()}} calls through the blob implementation into 
{{DataStoreBlobStore#getReference()}} which calls 
{{AbstractSharedCachingDataStore#getRecordIfStored()}}. If the blob is not 
cached this will result in a call to the backend's {{getRecord()}}.  For 
{{AzureBlobStoreBackend}}, for example, this actually currently makes two 
network calls - one to check if the blob exists, and another to get the blob 
metadata needed to construct the {{DataRecord}}.  (See 
[https://github.com/apache/jackrabbit-oak/blob/22c3be68e4bc7fdf811ab0fbb2471f2d026508e7/oak-blob-cloud-azure/src/main/java/org/apache/jackrabbit/oak/blob/cloud/azure/blobstorage/AzureBlobStoreBackend.java#L355).]
  But all that is really needed in this case is the reference, which can be 
obtained from the back end directly using the blob id - no network calls 
required.  Furthermore, the reason we are even trying to get the reference in 
the first place is to determine if this blob is stored inline or not.  Maybe 
there is a better way to determine this.
 * Starting at 
[https://github.com/apache/jackrabbit-oak/blob/22c3be68e4bc7fdf811ab0fbb2471f2d026508e7/oak-store-spi/src/main/java/org/apache/jackrabbit/oak/plugins/value/jcr/BinaryImpl.java#L107]
 - the call to {{getDownloadURI()}} eventually results in a call to the data 
store implementation's {{getDownloadURI()}} method.  In the case of 
{{AzureDataStore}}, this calls into the backend's {{createHttpDownloadURI()}} 
method which (now, due to OAK-7998) is checking that the binary exists - a 
network call - before creating the signed download URI.  Note that creating the 
download URI doesn't require the network call, but checking for the existence 
of the blob ID does.

In a benchmark test I showed that creating 1000 download URIs took just over 
40000 milliseconds, averaging around 40 milliseconds per request.  This result 
is actually not that bad - but removing the existence check and running the 
test again dropped the time to 147 milliseconds for all 1000 URIs.  So we can 
see that if the network latency is bad this could potentially be a problem.

> Minimize network calls required when creating a direct download URI
> -------------------------------------------------------------------
>
>                 Key: OAK-8552
>                 URL: https://issues.apache.org/jira/browse/OAK-8552
>             Project: Jackrabbit Oak
>          Issue Type: Sub-task
>          Components: blob-cloud, blob-cloud-azure
>            Reporter: Matt Ryan
>            Priority: Major
>
> We need to isolate and try to optimize network calls required to create a 
> direct download URI.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to