[
https://issues.apache.org/jira/browse/OAK-10116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17986285#comment-17986285
]
Joerg Hoh commented on OAK-10116:
---------------------------------
For the implementation side I think that skipping the calls to the blobstore
are only possible in certain, well-defined circumstances:
# The caller does not need to read the content of the binary reference, but
just needs the binary reference itself.
# The caller guarantees that the binary on the blobstore is present, so that
the exception handling of createValue would not be invoked.
for that there must be way to opt-in to this behavior (by default the code
behaves as of today), and probably the best way to do so would be on a
per-session basis; and the invoker, which wants to this use this new behavior
must explicitly define this when creating the session (tbd if this behavior can
be toggled during the runtime of the session or not). To make this work without
too many changes on code we should introduce a session-global SessionContext
(better naming welcome), where we could set such a flag to indicate this
requested behavior, which is then used only within this session.
> Performance problem when importing nodes with many binary properties and
> remote blobstore
> -----------------------------------------------------------------------------------------
>
> Key: OAK-10116
> URL: https://issues.apache.org/jira/browse/OAK-10116
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: blob-cloud, blob-plugins, jcr
> Affects Versions: 1.48.0, 1.58.0
> Reporter: Joerg Hoh
> Priority: Major
>
> We often import binaryless packages (using JR filevault) into our Oak
> instances, which are using a remote blobstore.
> We observe bad performance when we import nodes with binary properties. In
> this case stacktraces often look like this:
> {noformat}
> "Queue Processor for Subscriber agent publishSubscriber" #311 daemon prio=5
> os_prio=0 cpu=298928.76ms elapsed=576.04s tid=0x0000563f968c6800 nid=0x1644
> runnable [0x00007f2a609e3000]
> java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0([email protected]/Native
> Method)
> at
> java.net.SocketInputStream.socketRead([email protected]/SocketInputStream.java:115)
> at
> java.net.SocketInputStream.read([email protected]/SocketInputStream.java:168)
> at
> java.net.SocketInputStream.read([email protected]/SocketInputStream.java:140)
> at
> sun.security.ssl.SSLSocketInputRecord.read([email protected]/SSLSocketInputRecord.java:478)
> at
> sun.security.ssl.SSLSocketInputRecord.readHeader([email protected]/SSLSocketInputRecord.java:472)
> at
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket([email protected]/SSLSocketInputRecord.java:70)
> at
> sun.security.ssl.SSLSocketImpl.readApplicationRecord([email protected]/SSLSocketImpl.java:1328)
> at
> sun.security.ssl.SSLSocketImpl$AppInputStream.read([email protected]/SSLSocketImpl.java:971)
> at
> java.io.BufferedInputStream.fill([email protected]/BufferedInputStream.java:252)
> at
> java.io.BufferedInputStream.read1([email protected]/BufferedInputStream.java:292)
> at
> java.io.BufferedInputStream.read([email protected]/BufferedInputStream.java:351)
> - locked <0x00000007d98d0ca8> (a java.io.BufferedInputStream)
> at
> sun.net.www.http.HttpClient.parseHTTPHeader([email protected]/HttpClient.java:746)
> at
> sun.net.www.http.HttpClient.parseHTTP([email protected]/HttpClient.java:689)
> at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0([email protected]/HttpURLConnection.java:1615)
> - locked <0x00000007d98cb480> (a
> sun.net.www.protocol.https.DelegateHttpsURLConnection)
> at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream([email protected]/HttpURLConnection.java:1520)
> - locked <0x00000007d98cb480> (a
> sun.net.www.protocol.https.DelegateHttpsURLConnection)
> at
> java.net.HttpURLConnection.getResponseCode([email protected]/HttpURLConnection.java:527)
> at
> sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode([email protected]/HttpsURLConnectionImpl.java:334)
> at
> com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:115)
> at
> com.microsoft.azure.storage.blob.CloudBlob.downloadAttributes(CloudBlob.java:1414)
> at
> com.microsoft.azure.storage.blob.CloudBlob.downloadAttributes(CloudBlob.java:1381)
> at
> org.apache.jackrabbit.oak.blob.cloud.azure.blobstorage.AzureBlobStoreBackend.getRecord(AzureBlobStoreBackend.java:408)
> at
> org.apache.jackrabbit.oak.plugins.blob.AbstractSharedCachingDataStore.getRecordIfStored(AbstractSharedCachingDataStore.java:210)
> at
> org.apache.jackrabbit.core.data.AbstractDataStore.getRecordFromReference(AbstractDataStore.java:72)
> at
> org.apache.jackrabbit.oak.plugins.blob.datastore.DataStoreBlobStore.getBlobId(DataStoreBlobStore.java:402)
> at
> org.apache.jackrabbit.oak.segment.SegmentNodeStore.getBlob(SegmentNodeStore.java:257)
> at
> org.apache.jackrabbit.oak.composite.CompositeNodeStore.getBlob(CompositeNodeStore.java:202)
> at
> org.apache.jackrabbit.oak.core.MutableRoot.getBlob(MutableRoot.java:342)
> at
> org.apache.jackrabbit.oak.plugins.value.jcr.ValueFactoryImpl.createValue(ValueFactoryImpl.java:111)
> at
> org.apache.jackrabbit.vault.util.DocViewProperty.apply(DocViewProperty.java:413)
> at
> org.apache.jackrabbit.vault.fs.impl.io.DocViewSAXImporter.createNode(DocViewSAXImporter.java:1131)
> at
> org.apache.jackrabbit.vault.fs.impl.io.DocViewSAXImporter.addNode(DocViewSAXImporter.java:891)
> at
> org.apache.jackrabbit.vault.fs.impl.io.DocViewSAXImporter.startElement(DocViewSAXImporter.java:681)
> at
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement([email protected]/AbstractSAXParser.java:510)
> at
> com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement([email protected]/XMLNSDocumentScannerImpl.java:374)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next([email protected]/XMLDocumentFragmentScannerImpl.java:2710)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next([email protected]/XMLDocumentScannerImpl.java:605)
> at
> com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next([email protected]/XMLNSDocumentScannerImpl.java:112)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument([email protected]/XMLDocumentFragmentScannerImpl.java:534)
> at
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse([email protected]/XML11Configuration.java:888)
> at
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse([email protected]/XML11Configuration.java:824)
> at
> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse([email protected]/XMLParser.java:141)
> at
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse([email protected]/AbstractSAXParser.java:1216)
> at
> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse([email protected]/SAXParserImpl.java:635)
> at
> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse([email protected]/SAXParserImpl.java:324)
> at
> org.apache.jackrabbit.vault.fs.impl.io.GenericArtifactHandler.accept(GenericArtifactHandler.java:100)
> at
> org.apache.jackrabbit.vault.fs.io.Importer.commit(Importer.java:932)
> at
> org.apache.jackrabbit.vault.fs.io.Importer.commit(Importer.java:799)
> {noformat}
> In this context we can ensure that all binaries are available on the remote
> blobstore, so a call to the blobstore would not required, at least not for
> validating its presence; all other information could/should be part of the
> filevault package.
> In my opinion the ValueFactory should be able to create a binary property
> without reaching out to the blobstore to avoid the network latency. This
> would speed up the import process dramatically; as in the context of this
> situation we can create approx 20 binary properties per second, while we can
> create thousands of non-binary properties in the same time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)