[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437704#comment-16437704
 ] 

Matt Ryan commented on OAK-7083:
--------------------------------

[~amitjain] ah, thanks for the explanation.  I see what you are saying now.  As 
I understand it now the logic flow for {{addRecord()}} goes like this:
 * The composite data store determines which delegate should add the new record 
and invokes {{addRecord()}} on that delegate.
 * The composite data store captures the resulting {{DataRecord}} and updates 
the blob id with the encoded data store included.
 * This encoded version is what is stored in the node store.

And for reading the record:  When the composite data store is given a 
{{DataIdentifier}}, it extracts the data store id from the identifier if there 
is one, and uses that to look up the blob id, and passes a modified data 
identifier along to the delegate without the encoded data store id part.

I can see now how that would work.

I still have concerns about data portability between systems that don't use the 
composite and those that do, and limiting our ability to make use of future 
scenarios.

For data portability, as I see it there are two main cases.
 * The first has to do with going from an Oak instance that doesn't use a 
composite to one that does.  In that case the node store wouldn't have any data 
store info encoded into the stored blob ids.  Once the composite data store is 
being used, requests to read would be done using a blob ID that doesn't include 
the data store information.  So we can look it up from the available delegates, 
which is the fallback approach.  My question is, if we modify the blob ID in 
the {{DataRecord}} being returned, will the node store apply the updated blob 
ID?  If not, any blob IDs in the node store prior to moving to the composite 
data store would never include the encoded data store ID, unless a data 
migration was performed.
 ** Similar to this is the issue that could arise if a data store identifier is 
ever changed or lost.  In that case even though the blob IDs in the node store 
have an encoded data store identifier, the encoded data store identifier is now 
invalid and so we would need a way to update the blob IDs stored in the node 
store.
 * The second has to do with going from an Oak instance that uses a composite 
to one that does not.  In this case, after the migration the Oak instance would 
have a node store full of blob IDs it could not understand since they would 
include data store identifiers encoded into them but the logic to read the data 
store identifier out of the blob ID is not loaded into Oak anymore.  In these 
cases a data migration would be required.

For the possible future support of the use case where the composite stores a 
blob in more than one delegate, I'm not sure how we would support that other 
than to encode _all_ the data store identifiers into the blob ID.

What it all really boils down to this this:  Encoding the data store identifier 
into the blob ID creates a tight coupling between the data being stored and the 
specific implementation and configuration of data stores at a particular point 
in time, which has the effect of limiting our flexibility in supporting 
scenarios in the future without doing a data migration.

So it will be hard for me to be comfortable with the approach of encoding the 
data store ID into the blob ID unless someone can convince me that it is not a 
problem to create this tight coupling.  I will need my specific concerns 
addressed but also need to be put at ease about the general concern I have 
about the tight coupling.

And even if we were to agree that encoding the data store ID into the blob ID 
is not a problem, after going through this in my mind I see that there are a 
number of cases that would need to be tested.  This isn't a quick job that will 
only take a couple of days.  The unit tests that will need to be written to 
verify functionality in the situations I've described above, as well as the 
normal case, will take quite a few days to complete.  So it still exceeds the 
scope of what we had in mind when we discussed it at the Oakathon, in my view.

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> --------------------------------------------------------
>
>                 Key: OAK-7083
>                 URL: https://issues.apache.org/jira/browse/OAK-7083
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
>
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a 
> production instance.  At creation, the test instance will look like 
> production, but any changes made to the test instance do not affect 
> production.  The test instance can be quickly created from production by 
> cloning only the node store, and not requiring a copy of all the data in the 
> data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to