[
https://issues.apache.org/jira/browse/OAK-8551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt Ryan updated OAK-8551:
---------------------------
Description:
Oak cloud data stores (e.g. {{AzureDataStore}}, {{S3DataStore}}) are by
definition more susceptible to performance degradation due to network issues.
While we can't do much about the performance of uploading or downloading a
blob, there are other places within the implementations where we are making
network calls to the storage service which might be avoidable or minimized.
One example is the {{exists()}} call to check whether a blob with a particular
identifier exists in the blob storage. In some places {{exists()}} is being
called where instead we could simply attempt the network access and handle
failures elegantly, avoiding making an extra network call. In other places
perhaps a cache could be used to minimize round trips.
Another example is the higher-level {{getReference()}} call in
{{DataStoreBlobStore}}. This asks the implementation for a {{DataRecord}} and
then gets the reference from that, but in truth the data store backend can
already obtain a reference for an identifier on its own. Asking for the
{{DataRecord}} however requires a network request to get the blob metadata for
the record.
was:This will revert the change for OAK-7998. The exists check is too slow.
> Minimize network calls in cloud data stores (performance optimization)
> ----------------------------------------------------------------------
>
> Key: OAK-8551
> URL: https://issues.apache.org/jira/browse/OAK-8551
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: blob-cloud, blob-cloud-azure, doc
> Reporter: Matt Ryan
> Assignee: Matt Ryan
> Priority: Major
>
> Oak cloud data stores (e.g. {{AzureDataStore}}, {{S3DataStore}}) are by
> definition more susceptible to performance degradation due to network issues.
> While we can't do much about the performance of uploading or downloading a
> blob, there are other places within the implementations where we are making
> network calls to the storage service which might be avoidable or minimized.
> One example is the {{exists()}} call to check whether a blob with a
> particular identifier exists in the blob storage. In some places
> {{exists()}} is being called where instead we could simply attempt the
> network access and handle failures elegantly, avoiding making an extra
> network call. In other places perhaps a cache could be used to minimize
> round trips.
> Another example is the higher-level {{getReference()}} call in
> {{DataStoreBlobStore}}. This asks the implementation for a {{DataRecord}}
> and then gets the reference from that, but in truth the data store backend
> can already obtain a reference for an identifier on its own. Asking for the
> {{DataRecord}} however requires a network request to get the blob metadata
> for the record.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)