Amit Jain commented on OAK-7083:

The first has to do with going from an Oak instance that doesn't use a 
composite to one that does.  In that case the node store wouldn't have any data 
store info encoded into the stored blob ids.  Once the composite data store is 
being used, requests to read would be done using a blob ID that doesn't include 
the data store information.  So we can look it up from the available delegates, 
which is the fallback approach.  My question is, if we modify the blob ID in 
the DataRecord being returned, will the node store apply the updated blob ID?  
If not, any blob IDs in the node store prior to moving to the composite data 
store would never include the encoded data store ID, unless a data migration 
was performed.
Similar to this is the issue that could arise if a data store identifier is 
ever changed or lost.  In that case even though the blob IDs in the node store 
have an encoded data store identifier, the encoded data store identifier is now 
invalid and so we would need a way to update the blob IDs stored in the node 
Well if we have to support such a case then we'd could the ability to also 
designate a DataStore as a default (the original one) or give a migration.

The second has to do with going from an Oak instance that uses a composite to 
one that does not.  In this case, after the migration the Oak instance would 
have a node store full of blob IDs it could not understand since they would 
include data store identifiers encoded into them but the logic to read the data 
store identifier out of the blob ID is not loaded into Oak anymore.  In these 
cases a data migration would be required.
This one's easy as the extra identifier is not accounted for as we do today 
where we strip out the length information before fetching the blob from the 

So it will be hard for me to be comfortable with the approach of encoding the 
data store ID into the blob ID unless someone can convince me that it is not a 
problem to create this tight coupling.  I will need my specific concerns 
addressed but also need to be put at ease about the general concern I have 
about the tight coupling.
Hmm...Could you please enlist the use cases that you think that would have to 
be supported and this would be a deal breaker there. We as discussed in the 
Oakathon had come to the conclusion that some use cases are not a 
repository/DataStore concern and would be better served outside of Oak. 
But I think we keep going in circles around this here because for some cases 
where larger/remote repositories the performance would be a concern which needs 
to be alleviated even at the cost of flexibility. But even if we say that for 
the particular use case (for a combination on DataStore type and the 
repository/DataStore size) performance is not a concern then we've already 
limited the scope of flexibility.

I understand the concern about performance impact.
The performance tests are 1 aspect which can be taken up going forward. But 
could you add any details accompanying the use case (Composite read/write 
DataStores). So, any available data points regarding 
* Number of nodes & Size of repository
* Number of Blobs & Size of DataStore
* Type of DataStore (File, S3, Azure etc.)
Based on the above we can highlight where the feature makes sense and will work 
and for cases for which this is not recommended.

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> --------------------------------------------------------
>                 Key: OAK-7083
>                 URL: https://issues.apache.org/jira/browse/OAK-7083
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a 
> production instance.  At creation, the test instance will look like 
> production, but any changes made to the test instance do not affect 
> production.  The test instance can be quickly created from production by 
> cloning only the node store, and not requiring a copy of all the data in the 
> data store.

This message was sent by Atlassian JIRA

Reply via email to