[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918283#comment-16918283 ] Amit Jain commented on OAK-8552: [~alexander.klimetschek] Yes nothing needs to change now, I was just trying to highlight that there's dynamism involved with lastModified while other properties are stable over the lifetime of the blob even without de-deduplication. {quote}With direct binary access we also no longer support de-duplication (too costly), so that aspect can be ignored for binaries uploaded that way or if a corresponding configuration is set {quote} But there's still a way that the binaries can be uploaded through JCR. So, we would in an application still have the mix of both I guess. {quote}But there is still copying/versioning of nodes which leads to multiple references to the same blob - does this lead to a last modified update of the blob? {quote} No copy of nodes would not update the timestamp of the blob. Also, there's the case of a 'Shared' DataStore where nodes might have been replicated from another instance through some form of replication and share the DataStore. > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Assignee: Matt Ryan >Priority: Major > Fix For: 1.18.0 > > Attachments: OAK-8552_ApiChange.patch > > > We need to isolate and try to optimize network calls required to create a > direct download URI. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918033#comment-16918033 ] Matt Ryan commented on OAK-8552: Speeding up signed URI generation is fixed with [r1866044|https://svn.apache.org/viewvc?view=revision&revision=1866044]. I believe that is sufficient to resolve this issue, considering that the other aspect (improving the check to determine if a binary was inlined) was resolved via OAK-8578 IIUC. [~amitjain] please reopen if you disagree. > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Assignee: Matt Ryan >Priority: Major > Attachments: OAK-8552_ApiChange.patch > > > We need to isolate and try to optimize network calls required to create a > direct download URI. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917841#comment-16917841 ] Alexander Klimetschek commented on OAK-8552: [~amitjain] I don’t think [~ianeboston] was suggesting to use the jcr:lastModified JCR property for that. All of these could be internal fields not exposed on the JCR level. With direct binary access we also no longer support de-duplication (too costly), so that aspect can be ignored for binaries uploaded that way or if a corresponding configuration is set. But there is still copying/versioning of nodes which leads to multiple references to the same blob - does this lead to a last modified update of the blob? (Just curious because I wasn’t aware of that) Nonetheless, for the issue at hand I don’t think we need to change anything stored in the NodeStore. The size/length is already stored in the NS as part of the internal blob id. > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Assignee: Matt Ryan >Priority: Major > Attachments: OAK-8552_ApiChange.patch > > > We need to isolate and try to optimize network calls required to create a > direct download URI. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917525#comment-16917525 ] Amit Jain commented on OAK-8552: [~ianeboston] {quote}the Oak NodeStore (SegmentNodeStore or DocumentNodeStore) should be the record of authority for blob existence, length, lastModified {quote} The node's jcr:lastModified does not reflect the blob's lastModified timestamp and only signifies when the node was last modified (IIUC the spec also says that [1]). I am also not sure if it can reflect the blobs lastModified without taking a severe hit. The reason being since blobs are de-duplicated, when an already existing blob is uploaded to Jcr, its lastModified stamp is updated in the DataStore and the blob is not uploaded again to the DataStore. This update to the blob's lastModified is a requirement for DGC. This updated lastModified for the blob cannot be updated for all nodes from where already referenced retrospectively without a performance hit (and maybe a design change, DataStore is the lowest layer and has no information of the NodeStore and de-duplication with SHA hash is an implementation detail not known to the NodeStore). [1] - [https://docs.adobe.com/docs/en/spec/jcr/2.0/3_Repository_Model.html] > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Assignee: Matt Ryan >Priority: Major > Attachments: OAK-8552_ApiChange.patch > > > We need to isolate and try to optimize network calls required to create a > direct download URI. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916987#comment-16916987 ] Matt Ryan commented on OAK-8552: I did an implementation of how this would work using the config option to disable the existence check. See this [pull request|https://github.com/apache/jackrabbit-oak/compare/trunk...mattvryan:OAK-8552-with-config-to-disable-existence-check?expand=1] for details, and please feel free to comment. > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Assignee: Matt Ryan >Priority: Major > Attachments: OAK-8552_ApiChange.patch > > > We need to isolate and try to optimize network calls required to create a > direct download URI. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916482#comment-16916482 ] Ian Boston commented on OAK-8552: - I have not digested all the details, but IMHO, the Oak NodeStore (SegmentNodeStore or DocumentNodeStore) should be the record of authority for blob existence, length, lastModified, blobID, such that to answer any question about a binary, only the NodeStore should have to be consulted, and no network API calls made. Only when a process actively needs to validate the date in NodeStore or interact directly with the binaries (upload, download, streaming) should it be forced to make network API calls to the blob storage API. IIUC there are edge cases highlight by Alex that abuse this principal (async upload?), but in general the principal holds. Also IIUC to make the Oak NodeStore the record of authority might require some additional properties to be stored (existence flag, length?, lastModified ?, etc?) Sorry if I have oversimplified. > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Assignee: Matt Ryan >Priority: Major > Attachments: OAK-8552_ApiChange.patch > > > We need to isolate and try to optimize network calls required to create a > direct download URI. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916427#comment-16916427 ] Alexander Klimetschek commented on OAK-8552: [~amitjain] +1 to Blob#isInlined(), that seems like the right solution in place of the getReference() == null. > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Assignee: Matt Ryan >Priority: Major > Attachments: OAK-8552_ApiChange.patch > > > We need to isolate and try to optimize network calls required to create a > direct download URI. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916411#comment-16916411 ] Alexander Klimetschek commented on OAK-8552: To add some context: In our case that uncovered this issue, the problem really only exists because in some special case, we uploaded binaries through JCR with async uploads in caching DS enabled, and then immediately after the session.save() requested presigned GET URLs to pass them on to an external service. That lead us first to make the presigned URL generation support existence check and „polling“ by returning null for not-yet-in-blob-store cases. However, that is shifting the solution to the wrong end and increasing application complexity (polling loop). (Also note this edge case is a short term solution to be replaced at some point with proper direct binary access for upload) The source of the problem here is the async upload: we need to switch this to synchronous uploads = blocking session.save(), to avoid the issue in the first place. In all regular cases, binaries are uploaded through the new direct binary access, which by design guarantees the presence of the binary when the reference is in the NodeStore. If the binary gets deleted from the blob store due to some malfunctioning, then it does not matter to the application if we return null or an URL that returns 404. But the latter allows us to completely drop any existence checks upon presigned GET URL generation. Same for inlined: configuration must prevent this in the first place, then no special check is required at access time. > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Assignee: Matt Ryan >Priority: Major > Attachments: OAK-8552_ApiChange.patch > > > We need to isolate and try to optimize network calls required to create a > direct download URI. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916403#comment-16916403 ] Matt Ryan commented on OAK-8552: More thinking about this - this could also be done via configuration. But if done via config probably the default would be to check existence, since async uploads is also the default (IIUC). Turning off the check for existence without also disabling async uploads would be unadvisable because of the condition that can arise that was addressed in OAK-7998. But adding a config option here would be a simple fix without requiring API change that would allow an instance to skip the existence check for users who are aware of the tradeoff. This would require a doc change if this approach was taken. > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Assignee: Matt Ryan >Priority: Major > Attachments: OAK-8552_ApiChange.patch > > > We need to isolate and try to optimize network calls required to create a > direct download URI. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916280#comment-16916280 ] Matt Ryan commented on OAK-8552: Another option (mentioned in offline conversation w/ [~alexander.klimetschek]) would be somewhat of a combination of 2 and 3 in the previous comment. In this case we would allow clients via the API to specify that they want an existence check performed before generating the URI. The default would be to not check for existence. This way clients would be able to specify whether they want an existence guarantee and are willing to pay the performance hit. We could do this via the {{BinaryDownloadOptions}} object, perhaps via a new interface so we can avoid making a breaking API change? > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Assignee: Matt Ryan >Priority: Major > Attachments: OAK-8552_ApiChange.patch > > > We need to isolate and try to optimize network calls required to create a > direct download URI. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916035#comment-16916035 ] Matt Ryan commented on OAK-8552: WRT removing the existence check from {{createHttpDownloadURI()}}, I see the following options: # Leave the code as-is and live with the O(100ms) cost of checking existence before creating the signed URI - but clients know the blob existed at the time the URI was created # Revert OAK-7998, dropping the cost of generating a signed URI to O(100 microseconds) but clients may get a URI that returns a 404 (blob not yet in storage) # Leave the fix for OAK-7998 but add some form of cache or lookup table, which would consume additional memory but could speed up a non-zero number of signed URI generation requests, while still guaranteeing to clients that the blob existed at the time the URI was created ** If doing this we would need to figure out how to populate the cache or lookup table - on demand, at startup, etc. I'm pretty sure [~ianeboston] would vote for #2 or #3. Any other options or votes? > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Assignee: Matt Ryan >Priority: Major > Attachments: OAK-8552_ApiChange.patch > > > We need to isolate and try to optimize network calls required to create a > direct download URI. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915493#comment-16915493 ] Amit Jain commented on OAK-8552: bq. I'm afraid I don't currently understand why "remove the existence check" would "lead to perceived performance drop" Not removing the existence check but disabling the asynchronous uploads would lead to a perceived performance drop. bq. I don't understand how "remove the existence check" would work at all Yes it is about reverting OAK-7998. Without the existence check any download attempt using the signed download URI might fail because the blob backing it is not available (yet) in the cloud. I believe that is how things worked before introducing this additional check. > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Assignee: Matt Ryan >Priority: Major > Attachments: OAK-8552_ApiChange.patch > > > We need to isolate and try to optimize network calls required to create a > direct download URI. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914292#comment-16914292 ] Thomas Mueller commented on OAK-8552: - * #getReference ** My vote would go to "introduce a new cleaner API Blob#isInlined (name can be changed)". I might be more work, but I think the solution would be much clearer. * #exists ** I think the exists method implies a network access (unless inlined). ** I'm afraid I don't currently understand why "remove the existence check" would "lead to perceived performance drop"... I don't understand how "remove the existence check" would work at all... I mean, it's reverting OAK-7998, right? If we revert OAK-7998 (which might be OK), then I assume we need to solve the root problem of OAK-7998 in some other way. > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Assignee: Matt Ryan >Priority: Major > Attachments: OAK-8552_ApiChange.patch > > > We need to isolate and try to optimize network calls required to create a > direct download URI. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914136#comment-16914136 ] Amit Jain commented on OAK-8552: [~mattvryan] Regarding the 2 issues above * #getReference - Tried a few optimizations: ** Removed the need to get a DataRecord instance and use a dummy DataRecord instance [1]. It fails segment standby test case(s) because of expectation that non null getReference means available locally [2]. That can be fixed but the solution seems hacky. ** Another option can be to introduce a new cleaner API Blob#isInlined (name can be changed) as outlined in the patch [^OAK-8552_ApiChange.patch]. The changes touch a lot of places but is quite trivial. Test cases still need to be added. * #exists check - ** IIUC, the need to check existence is because of asynchronous uploads, then one option is to actually disable that and remove the existence check. It would lead to perceived performance drop, perceived because JCR call returns quickly but time to reach the cloud backend for a binary would be the same as in synchronous uploads or even little worse. ** Another option is to introduce a in-memory cache locally in the Backend for ids uploaded. The idea to use BlobTracker does not work because that was only introduced for DSGC and it also doesn't wait to add the id only after an asynchronous upload. Also, if DSGC is meant to run outside the server (i.e. oak-run) then it is most likely the BlobTracker would be disabled. [~tmueller] wdyt? [1] {code:java} Index: oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/datastore/DataStoreBlobStore.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 === --- oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/datastore/DataStoreBlobStore.java (revision b4e0a5ba954b7de4b508aa197847223800f1c320) +++ oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/datastore/DataStoreBlobStore.java (date 1566293954000) @@ -52,6 +52,8 @@ import com.google.common.io.Closeables; import org.apache.commons.io.FileUtils; import org.apache.commons.io.IOUtils; +import org.apache.jackrabbit.core.data.AbstractDataRecord; +import org.apache.jackrabbit.core.data.AbstractDataStore; import org.apache.jackrabbit.core.data.DataIdentifier; import org.apache.jackrabbit.core.data.DataRecord; import org.apache.jackrabbit.core.data.DataStore; @@ -312,16 +314,34 @@ return null; } -DataRecord record; -try { -record = delegate.getRecordIfStored(new DataIdentifier(blobId)); -if (record != null) { -return record.getReference(); -} else { -log.debug("No blob found for id [{}]", blobId); -} -} catch (DataStoreException e) { -log.warn("Unable to access the blobId for [{}]", blobId, e); +// Get reference without possible round-tripping using a dummy data record +if (delegate instanceof AbstractDataStore) { +return new AbstractDataRecord((AbstractDataStore) delegate, new DataIdentifier(blobId)) { + +@Override public long getLength() { +return 0; +} + +@Override public InputStream getStream() { +return null; +} + +@Override public long getLastModified() { +return 0; +} +}.getReference(); +} else { +DataRecord record; +try { +record = delegate.getRecordIfStored(new DataIdentifier(blobId)); +if (record != null) { +return record.getReference(); +} else { +log.debug("No blob found for id [{}]", blobId); +} +} catch (DataStoreException e) { +log.warn("Unable to access the blobId for [{}]", blobId, e); +} } return null; } {code} [2] [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/standby/client/RemoteBlobProcessor.java#L78-L89] > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Assignee: Matt Ryan >Priority: Major > Attachments: OAK-8552_ApiChange.patch > > > We need to isolate and try to optimize network calls required to create
[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912527#comment-16912527 ] Matt Ryan commented on OAK-8552: Simply removing the existence check implemented in OAK-7998 changes the stats as follows: * Getting a download URI for binaries uploaded through the JCR - average time drops from 65 milliseconds to around 120 microseconds (0.12 milliseconds). * Getting a download URI for binaries uploaded directly - average time drops from 200 milliseconds to around 130 milliseconds. I still think we need the fix in OAK-7998, but clearly if we can eliminate a network call to check existence that will help. Although it won't completely solve the problem by itself. > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Assignee: Matt Ryan >Priority: Major > > We need to isolate and try to optimize network calls required to create a > direct download URI. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912519#comment-16912519 ] Matt Ryan commented on OAK-8552: I wrote a simple test program which first creates a basic repo with binaries in it, then tries to get a download URI for each binary and times each step (getting the {{Binary}} object, and then requesting the download URI). Here's what I found: * Getting the {{Binary}} object from the repository takes almost no time. Average time is around 30 microseconds (0.03 milliseconds). * Almost all of the total time is in getting the download URI. In my testing today, average time to get a download URI is around 65 milliseconds for a binary uploaded through the JCR and around 200 milliseconds for a binary uploaded via direct upload. The difference between the two is probably the cost of the {{getReference()}} call mentioned above; my guess is the binaries uploaded via the JCR are probably still in cache so that makes {{getReference()}} faster for those. I will try a few different optimizations and post results. > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Assignee: Matt Ryan >Priority: Major > > We need to isolate and try to optimize network calls required to create a > direct download URI. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16910176#comment-16910176 ] Amit Jain commented on OAK-8552: [~mattvryan] 1. Essentially, I guess what you are saying is that the n/w call should only be one. I don't know enough about the Azure blob APIs but maybe we can see if we can substitute the below call so that we don't need the additional download attributes call later. {code} CloudBlockBlob blob = getAzureContainer().getBlockBlobReference(key); {code} bq. But all that is really needed in this case is the reference, which can be obtained from the back end directly using the blob id - no network calls required. Yes it still makes a call for existence in the DataStoreBlobStore#getReference() which can be potentially removed. Not sure if there are cases when the call receives a blob id but is not available in the backend. But in such a case that will lead to situation 2, is it not? bq. Furthermore, the reason we are even trying to get the reference in the first place is to determine if this blob is stored inline or not. Maybe there is a better way to determine this. I don't think there's much we can do here. The storage inline or not is an internal implementation of the blob store and the code in oak-store-spi would have to be aware of such details to check the same. 2. Here it seems there's no alternative but to check for existence. Is that right? Also, the test that you conducted removed the existence check here only or also in the condition observed in 1.? If its just the the existence check that is newly added then that is the major cause of the slow down ~ 250 times (147 s Vs 4). Not sure how much changing 1 will help according to your test. > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Priority: Major > > We need to isolate and try to optimize network calls required to create a > direct download URI. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (OAK-8552) Minimize network calls required when creating a direct download URI
[ https://issues.apache.org/jira/browse/OAK-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909545#comment-16909545 ] Matt Ryan commented on OAK-8552: The entry point for getting a direct download URI begins with a {{Binary}} instance and the {{getURI()}} call. Known causes of network requests in this call: * Starting at [https://github.com/apache/jackrabbit-oak/blob/22c3be68e4bc7fdf811ab0fbb2471f2d026508e7/oak-store-spi/src/main/java/org/apache/jackrabbit/oak/plugins/value/jcr/BinaryImpl.java#L96] - the call to {{getReference()}} calls through the blob implementation into {{DataStoreBlobStore#getReference()}} which calls {{AbstractSharedCachingDataStore#getRecordIfStored()}}. If the blob is not cached this will result in a call to the backend's {{getRecord()}}. For {{AzureBlobStoreBackend}}, for example, this actually currently makes two network calls - one to check if the blob exists, and another to get the blob metadata needed to construct the {{DataRecord}}. (See [https://github.com/apache/jackrabbit-oak/blob/22c3be68e4bc7fdf811ab0fbb2471f2d026508e7/oak-blob-cloud-azure/src/main/java/org/apache/jackrabbit/oak/blob/cloud/azure/blobstorage/AzureBlobStoreBackend.java#L355).] But all that is really needed in this case is the reference, which can be obtained from the back end directly using the blob id - no network calls required. Furthermore, the reason we are even trying to get the reference in the first place is to determine if this blob is stored inline or not. Maybe there is a better way to determine this. * Starting at [https://github.com/apache/jackrabbit-oak/blob/22c3be68e4bc7fdf811ab0fbb2471f2d026508e7/oak-store-spi/src/main/java/org/apache/jackrabbit/oak/plugins/value/jcr/BinaryImpl.java#L107] - the call to {{getDownloadURI()}} eventually results in a call to the data store implementation's {{getDownloadURI()}} method. In the case of {{AzureDataStore}}, this calls into the backend's {{createHttpDownloadURI()}} method which (now, due to OAK-7998) is checking that the binary exists - a network call - before creating the signed download URI. Note that creating the download URI doesn't require the network call, but checking for the existence of the blob ID does. In a benchmark test I showed that creating 1000 download URIs took just over 4 milliseconds, averaging around 40 milliseconds per request. This result is actually not that bad - but removing the existence check and running the test again dropped the time to 147 milliseconds for all 1000 URIs. So we can see that if the network latency is bad this could potentially be a problem. > Minimize network calls required when creating a direct download URI > --- > > Key: OAK-8552 > URL: https://issues.apache.org/jira/browse/OAK-8552 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: blob-cloud, blob-cloud-azure >Reporter: Matt Ryan >Priority: Major > > We need to isolate and try to optimize network calls required to create a > direct download URI. -- This message was sent by Atlassian JIRA (v7.6.14#76016)