[
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432553#comment-16432553
]
Advertising
Matt Ryan commented on OAK-7083:
--------------------------------
{quote}What prevents it from working?
{quote}
I'm not having a problem knowing how to encode the data store ID into the blob
ID. I'm saying it won't work well with the API constraints we have and with
the way {{CompositeDataStore}} is intended to work.
When {{CompositeDataStore.addRecord()}} is called, all it receives is an
{{InputStream}} and possibly a {{BlobOptions}} just like any other data store
implementation. It then has to determine which delegate it should use to
complete the operation. Since the only supported use case for now has only one
writable delegate, this is an easy choice. The composite data store simply
passes the arguments along to the delegate which performs the task of adding
the record to the repository. Of course, this returns a {{DataRecord}} which
is then returned by the composite data store to the caller. We can easily know
the blob id of the blob that was just added by looking at the return value from
the delegate {{addRecord()}} call. But the blob has already been stored in the
delegate data store by this time.
{{CompositeDataStore}} does not store anything on it's own, this always happens
via delegates. Before the delegate is invoked we only have an {{InputStream}}.
After the delegate is invoked we have a resulting {{DataRecord}} and the blob
is already stored. So we don't have a blob ID before calling the delegate;
after calling the delegate we have a blob ID but the blob has already been
written.
In order to modify the blob ID within {{CompositeDataStore}}, we would have to
wait until the delegate wrote the record, get the blob ID, modify it, and then
rewrite the blob with the new ID. How is that to be done if not by calling
{{addRecord()}} on the delegate in the first place? And even if we did that,
how is the modified delegate to be passed to the delegate data store? Since
{{OakFileDataStore}} doesn't implement {{TypedDataStore}} we cannot pass
anything to {{addRecord()}} other than the input stream. To change that would
require modifying {{FileDataStore}} in Jackrabbit.
These are the options for modifying the blob ID as I see them:
* Delegate writes the file, then the composite updates the blob ID and asks
the delegate to write it again. This means the file is written to the
destination twice which seems like a bad idea. It also would require extending
the capabilities of {{BlobOptions}} to support providing the blob ID, and since
{{OakFileDataStore}} doesn't implement {{TypedDataStore}} it doesn't currently
take a {{BlobOptions}} so this would also require adding that capability in
{{FileDataStore}} in Jackrabbit.
* Delegate writes the file, then the composite updates the blob ID and
rewrites the file itself. This duplicates logic from the delegate into the
composite data store, which is bad design for more than one reason, and still
writes the file twice, which still seems like a bad idea.
* Extend {{BlobOptions}} to accept some sort of transformer object or
function. This was my original approach. This allows the delegate to generate
the correct ID without it having to know anything about the custom encoding
being done. The blob ID is generated once and the record written to the
destination once. But it still requires the changes to {{FileDataStore}} in
Jackrabbit so this approach can work with {{OakFileDataStore}}.
All of these approaches feel very heavy for what we are trying to solve,
especially since it will also:
* Design us into a corner where we will not be able to support some of the
originally identified possible use cases (namely, storing the same blob in
multiple data stores).
* Entail a data migration for any user that wants to move an existing
installation to the {{CompositeDataStore}}. In my view that actually should
cease the discussion about encoding the data store ID into the blob ID
altogether.
Additionally, IIRC in the Oakathon we discussed we would look into the blob ID
approach to see if we could quickly add it in before accepting the PR, and if
it was an easy addition we would go ahead and put it in. Otherwise, we would
move forward with evaluating the PR for acceptance into Oak in order that
full-scale performance testing can begin on it as soon as possible.
In my view the scope of the change to support this feature (which after further
thought I don't think we should do at all) has gone beyond the "quick addition"
level to a rather significant change.
Based on that I propose we move forward with reviewing the PR if possible.
> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> --------------------------------------------------------
>
> Key: OAK-7083
> URL: https://issues.apache.org/jira/browse/OAK-7083
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
> Reporter: Matt Ryan
> Assignee: Matt Ryan
> Priority: Major
>
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store,
> and then uses a composite data store to refer to the first instance's data
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a
> production instance. At creation, the test instance will look like
> production, but any changes made to the test instance do not affect
> production. The test instance can be quickly created from production by
> cloning only the node store, and not requiring a copy of all the data in the
> data store.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)