Amit Jain commented on OAK-7083:

 Well the proposals you have outlined are not entirely different.
 Proposal 2 is the same as 1 just that it does not even make the code change 
and requires independent Mark/Sweep cycles. It still does not solve the problem 
of performance and wrong log messages.
{quote}In the normal shared data store use case I think the impact is that all 
of the connected repositories will try to run the sweep phase. The same blobs 
will be deleted by the first sweeper as would have been deleted before. It 
doesn't impact the ability to collect garbage,.
It surely does impact for normal Shared DataStore deployments. The problem here 
is that since, there is only 1 repository sweeping, the state maintenance files 
(to know whether all repo sweep phase has run) will not be cleaned up. For a 
normal setup we want only 1 repo sweeping so that is good but when do we clean 
up those files here and when do we remove the reference files? That is the 
problem here. The 2nd run will again see these reference files and start from a 
stale state thus not taking new blobs into account.

If we say the repositories for CompositeDataStore each have to run this 
 * Mark on all repos
 * Sweep
 Then we don't have to make any change to the MarkSweepGarbageCollector and the 
process will be the same as currently for normal deployments.

For proposal 3 yes that would mean the Primary using the CompositeDataStore 
abstraction as well. But once it does it does not require any complicated setup 
for DataStore, GC etc. 

{quote}Any other proposals?{quote}
Essentially, I am Ok with proposal 2 for a start and then we can enhance with 
the proposal that I outlined of encoding the blob ids with the role/type of the 
DataStore. Could you please also add a response on the clarifications I had 

{quote}but may impact efficiency or give confusing log messages (which might be 
How is it fixable with no information?

In addition I am beginning to think this particular use case might be slightly 
different from the CompositeDataStore where all the delegate DataStore are 
still managed by 1 repository. Here, the delegate DataStore are under the 
management of 2 different repositories.

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> --------------------------------------------------------
>                 Key: OAK-7083
>                 URL: https://issues.apache.org/jira/browse/OAK-7083
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a 
> production instance.  At creation, the test instance will look like 
> production, but any changes made to the test instance do not affect 
> production.  The test instance can be quickly created from production by 
> cloning only the node store, and not requiring a copy of all the data in the 
> data store.

This message was sent by Atlassian JIRA

Reply via email to