Matt Ryan commented on OAK-7083:

{quote}What prevents it from working?
I'm not having a problem knowing how to encode the data store ID into the blob 
ID.  I'm saying it won't work well with the API constraints we have and with 
the way {{CompositeDataStore}} is intended to work.

When {{CompositeDataStore.addRecord()}} is called, all it receives is an 
{{InputStream}} and possibly a {{BlobOptions}} just like any other data store 
implementation.  It then has to determine which delegate it should use to 
complete the operation.  Since the only supported use case for now has only one 
writable delegate, this is an easy choice.  The composite data store simply 
passes the arguments along to the delegate which performs the task of adding 
the record to the repository.  Of course, this returns a {{DataRecord}} which 
is then returned by the composite data store to the caller.  We can easily know 
the blob id of the blob that was just added by looking at the return value from 
the delegate {{addRecord()}} call.  But the blob has already been stored in the 
delegate data store by this time.

{{CompositeDataStore}} does not store anything on it's own, this always happens 
via delegates.  Before the delegate is invoked we only have an {{InputStream}}. 
 After the delegate is invoked we have a resulting {{DataRecord}} and the blob 
is already stored.  So we don't have a blob ID before calling the delegate; 
after calling the delegate we have a blob ID but the blob has already been 

In order to modify the blob ID within {{CompositeDataStore}}, we would have to 
wait until the delegate wrote the record, get the blob ID, modify it, and then 
rewrite the blob with the new ID.  How is that to be done if not by calling 
{{addRecord()}} on the delegate in the first place?  And even if we did that, 
how is the modified delegate to be passed to the delegate data store?  Since 
{{OakFileDataStore}} doesn't implement {{TypedDataStore}} we cannot pass 
anything to {{addRecord()}} other than the input stream.  To change that would 
require modifying {{FileDataStore}} in Jackrabbit.

These are the options for modifying the blob ID as I see them:
 * Delegate writes the file, then the composite updates the blob ID and asks 
the delegate to write it again.  This means the file is written to the 
destination twice which seems like a bad idea.  It also would require extending 
the capabilities of {{BlobOptions}} to support providing the blob ID, and since 
{{OakFileDataStore}} doesn't implement {{TypedDataStore}} it doesn't currently 
take a {{BlobOptions}} so this would also require adding that capability in 
{{FileDataStore}} in Jackrabbit.
 * Delegate writes the file, then the composite updates the blob ID and 
rewrites the file itself.  This duplicates logic from the delegate into the 
composite data store, which is bad design for more than one reason, and still 
writes the file twice, which still seems like a bad idea.
 * Extend {{BlobOptions}} to accept some sort of transformer object or 
function.  This was my original approach.  This allows the delegate to generate 
the correct ID without it having to know anything about the custom encoding 
being done.  The blob ID is generated once and the record written to the 
destination once.  But it still requires the changes to {{FileDataStore}} in 
Jackrabbit so this approach can work with {{OakFileDataStore}}.

All of these approaches feel very heavy for what we are trying to solve, 
especially since it will also:
 * Design us into a corner where we will not be able to support some of the 
originally identified possible use cases (namely, storing the same blob in 
multiple data stores).
 * Entail a data migration for any user that wants to move an existing 
installation to the {{CompositeDataStore}}.  In my view that actually should 
cease the discussion about encoding the data store ID into the blob ID 

Additionally, IIRC in the Oakathon we discussed we would look into the blob ID 
approach to see if we could quickly add it in before accepting the PR, and if 
it was an easy addition we would go ahead and put it in.  Otherwise, we would 
move forward with evaluating the PR for acceptance into Oak in order that 
full-scale performance testing can begin on it as soon as possible.

In my view the scope of the change to support this feature (which after further 
thought I don't think we should do at all) has gone beyond the "quick addition" 
level to a rather significant change.

Based on that I propose we move forward with reviewing the PR if possible.

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> --------------------------------------------------------
>                 Key: OAK-7083
>                 URL: https://issues.apache.org/jira/browse/OAK-7083
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a 
> production instance.  At creation, the test instance will look like 
> production, but any changes made to the test instance do not affect 
> production.  The test instance can be quickly created from production by 
> cloning only the node store, and not requiring a copy of all the data in the 
> data store.

This message was sent by Atlassian JIRA

Reply via email to