Matt Ryan commented on OAK-7083:
Thanks [~amjain] for your comments and review so far.
Since there are a lot of questions I'm going to try to distill down to what I
think are key issues and then work through the dependent issues as they come.
Let's consider first the proposals to handle garbage collection for composite
data stores. I think there are three currently. For reference, my original
proposal is: Change the MarkSweepGarbageCollector so we don't remove any
"references" files from the metadata area until all repositories connected to a
data store have attempted the sweep phase. I think the three proposals are:
# Move forward with the change I proposed.
# Require that every repository complete the "mark" phase before any
repository can attempt a "sweep" phase.
# Use my proposal but only for repositories using CompositeDataStore.
h3. Proposal 1
I believe the concern with proposal 1 is that production repositories sharing
the same data store may run GC on completely different schedules. We can't be
sure that all repositories complete a mark phase before any repository attempts
a sweep phase. In the context of my proposal, I believe what this means is
that blobs that should be deleted may take longer to delete than expected - for
example, it may require a couple of invocations.
In the normal shared data store use case I think the impact is that all of the
connected repositories will try to run the sweep phase. The same blobs will be
deleted by the first sweeper as would have been deleted before. It doesn't
impact the ability to collect garbage, but may impact efficiency or give
confusing log messages (which might be fixable).
In the composite data store use case since either repository may have the
ability to delete blobs that the other repository cannot delete this may mean
that it takes multiple cycles to do this. For example, assuming a production
and staging system, if the staging system deletes a node with a blob reference,
and then runs mark and then sweep, the sweep may fail since the production
hasn't done the mark phase yet (no "references" file from production repo).
Later, the production system would mark and then sweep, deleting the blobs but
unable to delete blobs on the staging side. However, with my change the
"references" files remain, so the next time the staging system runs mark and
sweep it will be able to sweep since all the "references" files are still
there, and then it will delete the blob that became unreferenced before.
So eventually I think blobs that should be collected will end up collected
although it may take a while.
h3. Proposal 2
If we require that every repository complete the "mark" phase before any
repository can attempt a "sweep" phase, it won't eliminate the need for every
repository to perform the sweep. This is still needed because each repository
has binaries that only can be deleted by that repository.
What it could do is hopefully coordinate the sweep phases so not so much time
elapses as in proposal 1.
However, I think you still have to answer the question, what does a repository
do if it is ready to sweep but not all repositories have completed the mark
phase? This is almost what we have now. If not every repository has completed
the mark phase, and one repository wants to sweep, what happens? I assume it
just cancels the sweep until the next scheduled GC time. In which case I don't
see how this is any better than proposal 1.
h3. Proposal 3
This proposal is to only use my GC changes with CompositeDataStore. I'm not
sure exactly what we mean by this.
We could say that it is only used in repositories that are using a
CompositeDataStore. This could be done, although it would probably require
changing the node store code so that it obtains the garbage collector from a
registered reference instead of instantiating it directly, and then having the
different data stores register a garbage collector for use by the node store.
It might complicate the dependency tree and other things depending on how the
garbage collector becomes available to the node store (see the
SegmentNodeStoreService code where the MarkSweepGarbageCollector is
instantiated to see what I mean).
But it doesn't matter because this approach won't actually solve the problem,
in my view. The reason is that *both* of the systems participating have to use
the same garbage collection algorithm. In other words, if staging has the
CompositeDataStore, it is going to rely upon the production system to write the
"sweepComplete" metadata file and leave the "references" files in order for the
staging system to successfully complete the "sweep" phase. The production
system isn't using CompositeDataStore, though, so if it is relying on having a
running CompositeDataStore or something to get the different GC algorithm that
So the one that would work would be to simply come up with a new garbage
collector class that must be configured to be used for any system that is using
*or coordinating with* a CompositeDataStore.
In this case we could avoid changing the way GC works on standard shared data
store systems, but it would require that existing systems coordinating with one
using a CompositeDataStore be configured differently to do so, which feels bad
to me. It seems like it would be better if the other systems don't have to be
configured a certain way based on the configuration of another participant in
the system (tight coupling issue).
[~amjain] do you feel I've covered the proposals accurately? Once we think we
are on the same page we can dig into them and figure out how to resolve the
Any other proposals?
> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> Key: OAK-7083
> URL: https://issues.apache.org/jira/browse/OAK-7083
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
> Reporter: Matt Ryan
> Assignee: Matt Ryan
> Priority: Major
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store,
> and then uses a composite data store to refer to the first instance's data
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a
> production instance. At creation, the test instance will look like
> production, but any changes made to the test instance do not affect
> production. The test instance can be quickly created from production by
> cloning only the node store, and not requiring a copy of all the data in the
> data store.
This message was sent by Atlassian JIRA