[ https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373222#comment-16373222 ]
Matt Ryan commented on OAK-7083: -------------------------------- Since [https://github.com/apache/jackrabbit-oak/pull/80] entails a change to the {{MarkSweepGarbageCollector}}, we should discuss the change here to see if there are concerns with it or if there is a better approach. Let me try to briefly explain the change I've proposed to the {{MarkSweepGarbageCollector}}. In the use case I tested there are two Oak repositories, one which we will call primary and one which we will call secondary. Primary gets created first; secondary is created by cloning the node store of primary, then using a {{CompositeDataStore}} to have two delegate data stores. The first delegate is the same as the data store for primary, in read-only mode. The second delegate is only accessible by the secondary repo. Let the data store shared by primary and secondary be called DS_P and the data store being used only by the secondary be called DS_S. DS_P can be read by secondary but not modified, so all changes on secondary are saved in DS_S. Primary can still make changes to DS_P. Suppose after creating both repositories, records A and B are deleted from the primary repo, and records B and C are deleted from the secondary repo. Since DS_P is shared, only blob B should actually be deleted from DS_P via GC. After both repositories run their “mark” phase, the primary repo created a “references” file in DS_P excluding A and B, meaning primary thinks A and B can both be deleted. And the secondary repo created a “references” file in DS_P excluding B and C, meaning secondary thinks B and C can both be deleted. Suppose then primary runs the sweep phase first. It will first verify that it has a references file for each repository file in DS_P. Since both primary and secondary put one there this test passes. It will then merge all the data in all the references files in DS_P with its own local view of the existing blobs, and come up with a set of blobs to delete. Primary will conclude that blobs B and C should be deleted - B because both primary and secondary said it is deleted, and C because secondary said it should be deleted and primary has no knowledge of C so it will assume it is okay to delete. At this point primary will delete B and try to delete C and fail (which is ok). Then primary will delete its “references” file from DS_P and call the sweep phase complete. Now the problem comes when secondary tries to run the sweep phase. It will first try to verify that a references file exists for each repository file in DS_P - and fail. This fails because primary deleted its references file already. Thus secondary will cancel GC and thus blob C never ends up getting deleted. Note that secondary must delete C because it is the only repository that knows about C. This same situation exists also if secondary sweeps first. If record D was created by primary after secondary was cloned, then D is deleted by primary, secondary never knows about blob D so it cannot delete it during the sweep phase - it can only be deleted by primary. The change I made to the garbage collector is that when a repository finishes the sweep phase, it doesn’t necessarily delete the references file. Instead it marks the data store with a “sweepComplete” file indicating that this repository finished the sweep phase. When there is a “sweepComplete” file for every repository (in other words, the last repository to sweep), then all the references files are deleted. I wrote an integration test to test DSGC for this specific composite data store use case at [https://github.com/mattvryan/jackrabbit-oak/blob/39b33fe94a055ef588791f238eb85734c34062f3/oak-blob-composite/src/test/java/org/apache/jackrabbit/oak/blob/composite/CompositeDataStoreRORWIT.java.] All the oak unit tests pass with this change. I am concerned about any unforeseen consequences that others on-list may have about this change. Also there’s the issue that sweeping must now be done by every repository sharing the data store, which will have some inefficiencies. I’m open to changes or to a different approach if we can solve the problem described above still. > CompositeDataStore - ReadOnly/ReadWrite Delegate Support > -------------------------------------------------------- > > Key: OAK-7083 > URL: https://issues.apache.org/jira/browse/OAK-7083 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: blob, blob-cloud, blob-cloud-azure, blob-plugins > Reporter: Matt Ryan > Assignee: Matt Ryan > Priority: Major > > Support a specific composite data store use case, which is the following: > * One instance uses no composite data store, but instead is using a single > standard Oak data store (e.g. FileDataStore) > * Another instance is created by snapshotting the first instance node store, > and then uses a composite data store to refer to the first instance's data > store read-only, and refers to a second data store as a writable data store > One way this can be used is in creating a test or staging instance from a > production instance. At creation, the test instance will look like > production, but any changes made to the test instance do not affect > production. The test instance can be quickly created from production by > cloning only the node store, and not requiring a copy of all the data in the > data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)