[ 
https://issues.apache.org/jira/browse/OAK-8551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan updated OAK-8551:
---------------------------
    Description: 
Oak cloud data stores (e.g. {{AzureDataStore}}, {{S3DataStore}}) are by 
definition more susceptible to performance degradation due to network issues.  
While we can't do much about the performance of uploading or downloading a 
blob, there are other places within the implementations where we are making 
network calls to the storage service which might be avoidable or minimized.

One example is the {{exists()}} call to check whether a blob with a particular 
identifier exists in the blob storage.  In some places {{exists()}} is being 
called where instead we could simply attempt the network access and handle 
failures elegantly, avoiding making an extra network call.  In other places 
perhaps a cache could be used to minimize round trips.

Another example is the higher-level {{getReference()}} call in 
{{DataStoreBlobStore}}.  This asks the implementation for a {{DataRecord}} and 
then gets the reference from that, but in truth the data store backend can 
already obtain a reference for an identifier on its own.  Asking for the 
{{DataRecord}} however requires a network request to get the blob metadata for 
the record.

  was:This will revert the change for OAK-7998.  The exists check is too slow.


> Minimize network calls in cloud data stores (performance optimization)
> ----------------------------------------------------------------------
>
>                 Key: OAK-8551
>                 URL: https://issues.apache.org/jira/browse/OAK-8551
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: blob-cloud, blob-cloud-azure, doc
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
>
> Oak cloud data stores (e.g. {{AzureDataStore}}, {{S3DataStore}}) are by 
> definition more susceptible to performance degradation due to network issues. 
>  While we can't do much about the performance of uploading or downloading a 
> blob, there are other places within the implementations where we are making 
> network calls to the storage service which might be avoidable or minimized.
> One example is the {{exists()}} call to check whether a blob with a 
> particular identifier exists in the blob storage.  In some places 
> {{exists()}} is being called where instead we could simply attempt the 
> network access and handle failures elegantly, avoiding making an extra 
> network call.  In other places perhaps a cache could be used to minimize 
> round trips.
> Another example is the higher-level {{getReference()}} call in 
> {{DataStoreBlobStore}}.  This asks the implementation for a {{DataRecord}} 
> and then gets the reference from that, but in truth the data store backend 
> can already obtain a reference for an identifier on its own.  Asking for the 
> {{DataRecord}} however requires a network request to get the blob metadata 
> for the record.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to