Matt Ryan commented on OAK-7083:

{quote} bq. Now the problem comes when secondary tries to run the sweep phase. 
It will first try to verify that a references file exists for each repository 
file in DS_P - and fail. This fails because primary deleted its references file 
already. Thus secondary will cancel GC and thus blob C never ends up getting 
deleted. Note that secondary must delete C because it is the only repository 
that knows about C.
 bq. This same situation exists also if secondary sweeps first. If record D was 
created by primary after secondary was cloned, then D is deleted by primary, 
secondary never knows about blob D so it cannot delete it during the sweep 
phase - it can only be deleted by primary.

The solution for SharedDataStore currently is to require all repositories to 
run a Mark phase then run the Sweep phase on one of them.

Yes.  Sorry, I didn’t mention that.  I was trying to be brief and ended up 
being unclear.  In the situation I described above it is definitely running the 
mark phase first and then the sweep phase.  The problem is still as I described 
- no matter which one runs sweep first, it cannot delete all the binaries that 
may possibly have been deleted on both systems.

{quote} bq. The change I made to the garbage collector is that when a 
repository finishes the sweep phase, it doesn’t necessarily delete the 
references file. Instead it marks the data store with a “sweepComplete” file 
indicating that this repository finished the sweep phase. When there is a 
“sweepComplete” file for every repository (in other words, the last repository 
to sweep), then all the references files are deleted.

Well currently the problem is that all repositories are not required to run the 
sweep phase. The solution above would have been ok when the GC is to be run 
manually at different times as in your case.
Exactly - in the case I’ve described both have to successfully run a sweep or 
not all binaries will be deleted.
{quote}But in the real world applications typically there's a cron (e.g. AEM 
maintenance task) which could be setup to execute weekly at a particular time 
on all repositories. In this case in almost all cases the repository which 
finished the Mark phase at the last would only be able to execute the Sweep 
phase as it would be the only repository to see all the reference files for 
other repos (others executing before it would fail). This is still Ok for the 
{{SharedDataStore}} use cases we have. But with the above solution since not 
all repositories would be able to run the sweep phase the reference files won't 
be cleaned up.
A very valid point.  I'll need to think that one through some more.
{quote}Besides there's a problem of the Sweep phase on the primary encountering 
blobs it does not know about (from the secondary) and which it cannot delete 
creating an unpleasant experience. As I understand the Primary could be a 
production system and having these sort of errors crop up would be problematic.
If they are regarded as errors, yes.  Currently this logs a WARN level message 
(not an ERROR) which suggests that sometimes not all the binaries targeted for 
deletion will actually be deleted.

So this might be an issue of setting clear expectations.  But I do see the 
{quote}So, generically the solution would be to use the shared {{DataStore}} GC 
paradigm we currently have which requires Mark phase to be run on all 
repositories before running a Sweep.
Yes - like I said this is being done, it still requires that both repos do a 
{quote}For this specific use case some observations and quick rough sketch of a 
possible solution:
 * The \{{DataStore}}s for the 2 repositories - Primary & Secondary can be 
thought of as Shared & Private
 ** Primary does not know about Secondary and could be an existing repository 
and thus does not know about the {{DataStore}} of the Secondary as well. In 
other words it could even function as a normal {{DataStore}} and need not be a 
 ** Secondary does need to know about the Primary and thus registers itself as 
sharing the Primary {{DataStore}}.
 * Encode the blobs ids on the Secondary with the {{DataStore}} location/type 
with which we can distinguish the blob ids belonging to the respective 
That’s a solution that only works in this very specific use case of 
CompositeDataStore.  In the future if we were ever to want to support different 
scenarios we would then have to reconsider how it encodes blobs for each 
delegate.  Would that mean that data written to a data store by the 
CompositeDataStore could not be read by another CompositeDataStore referencing 
the same delegate?
{quote} * Secondary's Mark phase only redirects the Primary owned blobids to 
the references file in the Primary's {{DataStore}} (Primary's {{DataStore}} 
operating as Shared).{quote}
Same issue - GC happens outside of the control of the DataStore.

It’s a good idea Amit - something I struggled with quite a while.  I considered 
the same approach as well.  But it tightly binds garbage collection to the data 
store, whereas now they are currently very loosely bound.  GC leverages the 
DataStore APIs to do GC tasks (like reading and writing metadata files) but the 
DataStore doesn’t have any knowledge that GC is even happening.

So i don’t see how the CompositeDataStore could control execution of GC only on 
the independent data store.

Furthermore, future uses of CompositeDataStore might not be so clear-cut.  A 
CompositeDataStore might have 5 delegates, some of which are shared, some are 
not, some are read-only, some are not.  How would it know which ones to GC 
independently and which ones to do shared?

I think it is better to leave the GC logic where it is and let the DataStore 
(and the CompositeDataStore) remain unaware of GC logic, if possible.


I’m confident the solution I proposed works correctly, in testing.  I 
understand there are undesirable consequences.  I also get the point you made, 
a very good one, which is that this is unlikely to work well in the real world 
due to how production systems function.


What else could we do to address this?

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> --------------------------------------------------------
>                 Key: OAK-7083
>                 URL: https://issues.apache.org/jira/browse/OAK-7083
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a 
> production instance.  At creation, the test instance will look like 
> production, but any changes made to the test instance do not affect 
> production.  The test instance can be quickly created from production by 
> cloning only the node store, and not requiring a copy of all the data in the 
> data store.

This message was sent by Atlassian JIRA

Reply via email to