Matt Ryan commented on OAK-7083:


You are right that Proposal 1 and Proposal 2 are not that different.

I think you clarified something I didn't realize before - that in a normal 
{{SharedDataStore}} environment, only one of the systems will have the "sweep" 
phase scheduled.  In other words GC sweep is only ever performed on one system. 
 I did not realize that.

This scenario CANNOT work with {{CompositeDataStore}} in the use case described 
here.  It cannot work because there will always be blobs that can only be 
deleted by one system or the other.  Once both the production and staging 
systems are both running and operational, any binaries created on either system 
from that point onward that are later deleted can only be garbage collected by 
that system.

You said:
{quote}Then we don't have to make any change to the MarkSweepGarbageCollector 
and the process will be the same as currently for normal deployments.
I do not agree with that statement.  I do not think it is possible for GC to 
work in an environment with {{CompositeDataStore}} unless some code changes are 
made to the garbage collector.  Either we need to make these changes to the 
{{MarkSweepGarbageCollector}} or we have to make another garbage collector that 
is designed to work with {{CompositeDataStore}}.
{quote}Essentially, I am Ok with proposal 2 for a start
In my view, Proposal 2 offers no real advantages to Proposal 1 but adds 
complexity due to the requirement to coordinate between repositories on the 
mark phase, beyond what is already being done.  With both proposals it is still 
required to make changes to the garbage collector and it is still required that 
ALL repositories perform the sweep phase.
{quote}then we can enhance with the proposal that I outlined of encoding the 
blob ids with the role/type of the DataStore. Could you please also add a 
response on the clarifications I had above.
I haven't forgotten this part, Amit.  But currently no encoding is being done.  
That's not the approach that was taken.  I am not convinced yet that it is 
needed or that we want to tie blobs tightly to the repository that wrote them.

I'd like to first settle on an approach to this GC issue and then see if that 
additional step is actually required.

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> --------------------------------------------------------
>                 Key: OAK-7083
>                 URL: https://issues.apache.org/jira/browse/OAK-7083
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a 
> production instance.  At creation, the test instance will look like 
> production, but any changes made to the test instance do not affect 
> production.  The test instance can be quickly created from production by 
> cloning only the node store, and not requiring a copy of all the data in the 
> data store.

This message was sent by Atlassian JIRA

Reply via email to