Re: Fwd: [CompositeDataStore] pull request to review GC changes
Hi, On February 22, 2018 at 3:32:29 AM, Amit Jain (am...@ieee.org) wrote: Hi, Could you please move this discussion to the relvant jira issue. Thanks Amit Sure. Moved to https://issues.apache.org/jira/browse/OAK-7083. -MR
Fwd: [CompositeDataStore] pull request to review GC changes
Hi, Could you please move this discussion to the relvant jira issue. Thanks Amit On Thu, Feb 22, 2018 at 1:32 PM, Matt Ryanwrote: > Hi, > > > On February 21, 2018 at 10:32:48 PM, Amit Jain (am...@ieee.org) wrote: > > Hi, > > > > Now the problem comes when secondary tries to run the sweep phase. It > will > > first try to verify that a references file exists for each repository > file > > in DS_P - and fail. This fails because primary deleted its references > file > > already. Thus secondary will cancel GC and thus blob C never ends up > > getting deleted. Note that secondary must delete C because it is the > only > > repository that knows about C. > > > > This same situation exists also if secondary sweeps first. If record D > was > > created by primary after secondary was cloned, then D is deleted by > > primary, secondary never knows about blob D so it cannot delete it > during > > the sweep phase - it can only be deleted by primary. > > > > The solution for Shared DataStore currently is to require all repositories > to run a Mark phase then run the Sweep phase on one of them. > > > Yes. Sorry, I didn’t mention that. I was trying to be brief and ended up > being unclear. In the situation I described above it is definitely running > the mark phase first and then the sweep phase. The problem is still as I > described - no matter which one runs sweep first, it cannot delete all the > binaries that may possibly have been deleted on both systems. > The problem is because that's how the systems are set up. For this particular problem on the Secondary there is no reason to even account for the Primary's datastore as it should not and cannot delete anything in there. > > > > > The change I made to the garbage collector is that when a repository > > finishes the sweep phase, it doesn’t necessarily delete the references > > file. Instead it marks the data store with a “sweepComplete” file > > indicating that this repository finished the sweep phase. When there is a > > > “sweepComplete” file for every repository (in other words, the last > > repository to sweep), then all the references files are deleted. > > > > Well currently the problem is that all repositories are not required to > run the sweep phase. The solution above would have been ok when the GC is > to be run manually at different times as in your case. > > Exactly - in the case I’ve described both have to successfully run a sweep > or not all binaries will be deleted. > > > But in the real world applications typically there's a cron (e.g. AEM > maintenance task) which could be setup to execute weekly at a particular > time on all repositories. In this case in almost all cases the repository > which finished the Mark phase at the last would only be able to execute the > > Sweep phase as it would be the only repository to see all the reference > files for other repos (others executing before it would fail). This is > still Ok for the Shared DataStore use cases we have. > But with the above solution since not all repositories would be able to run > > the sweep phase the reference files won't be cleaned up. > > A very valid point. I’ll need to think that one through some more. > > > > Besides there's a problem of the Sweep phase on the primary encountering > blobs it does not know about (from the secondary) and which it cannot > delete creating an unpleasant experience. As I understand the Primary could > > be a production system and having these sort of errors crop up would be > problematic. > > If they are regarded as errors, yes. Currently this logs a WARN level > message (not an ERROR) which suggests that sometimes not all the binaries > targeted for deletion will actually be deleted. > > So this might be an issue of setting clear expectations. But I do see the > point. > Yes these are logged as WARN as these are not fatal and empirically these are problematic and is questioned by customers. But apart from that there is a performance impact also as each binary is attempted for deletion which incurs a penalty. > > > So, generically the solution would be to use the shared DataStore GC > paradigm we currently have which requires Mark phase to be run on all > repositories before running a Sweep. > > Yes - like I said this is being done, it still requires that both repos do > a sweep. > > > > For this specific use case some observations and quick rough sketch of a > possible solution: > * The DataStores for the 2 repositories - Primary & Secondary can be > thought of as Shared & Private > ** Primary does not know about Secondary and could be an existing > repository and thus does not know about the DataStore of the Secondary as > well. In other words it could even function as a normal DataStore and need > > not be a CompositeDataStore. > > ** Secondary does need to know about the Primary and thus registers itself > > as sharing the Primary DataStore. > * Encode the blobs ids on the Secondary with the DataStore location/type
Re: [CompositeDataStore] pull request to review GC changes
Hi, On February 21, 2018 at 10:32:48 PM, Amit Jain (am...@ieee.org) wrote: Hi, > Now the problem comes when secondary tries to run the sweep phase. It will > first try to verify that a references file exists for each repository file > in DS_P - and fail. This fails because primary deleted its references file > already. Thus secondary will cancel GC and thus blob C never ends up > getting deleted. Note that secondary must delete C because it is the only > repository that knows about C. > > This same situation exists also if secondary sweeps first. If record D was > created by primary after secondary was cloned, then D is deleted by > primary, secondary never knows about blob D so it cannot delete it during > the sweep phase - it can only be deleted by primary. > The solution for Shared DataStore currently is to require all repositories to run a Mark phase then run the Sweep phase on one of them. Yes. Sorry, I didn’t mention that. I was trying to be brief and ended up being unclear. In the situation I described above it is definitely running the mark phase first and then the sweep phase. The problem is still as I described - no matter which one runs sweep first, it cannot delete all the binaries that may possibly have been deleted on both systems. > > The change I made to the garbage collector is that when a repository > finishes the sweep phase, it doesn’t necessarily delete the references > file. Instead it marks the data store with a “sweepComplete” file > indicating that this repository finished the sweep phase. When there is a > “sweepComplete” file for every repository (in other words, the last > repository to sweep), then all the references files are deleted. > > Well currently the problem is that all repositories are not required to run the sweep phase. The solution above would have been ok when the GC is to be run manually at different times as in your case. Exactly - in the case I’ve described both have to successfully run a sweep or not all binaries will be deleted. But in the real world applications typically there's a cron (e.g. AEM maintenance task) which could be setup to execute weekly at a particular time on all repositories. In this case in almost all cases the repository which finished the Mark phase at the last would only be able to execute the Sweep phase as it would be the only repository to see all the reference files for other repos (others executing before it would fail). This is still Ok for the Shared DataStore use cases we have. But with the above solution since not all repositories would be able to run the sweep phase the reference files won't be cleaned up. A very valid point. I’ll need to think that one through some more. Besides there's a problem of the Sweep phase on the primary encountering blobs it does not know about (from the secondary) and which it cannot delete creating an unpleasant experience. As I understand the Primary could be a production system and having these sort of errors crop up would be problematic. If they are regarded as errors, yes. Currently this logs a WARN level message (not an ERROR) which suggests that sometimes not all the binaries targeted for deletion will actually be deleted. So this might be an issue of setting clear expectations. But I do see the point. So, generically the solution would be to use the shared DataStore GC paradigm we currently have which requires Mark phase to be run on all repositories before running a Sweep. Yes - like I said this is being done, it still requires that both repos do a sweep. For this specific use case some observations and quick rough sketch of a possible solution: * The DataStores for the 2 repositories - Primary & Secondary can be thought of as Shared & Private ** Primary does not know about Secondary and could be an existing repository and thus does not know about the DataStore of the Secondary as well. In other words it could even function as a normal DataStore and need not be a CompositeDataStore. ** Secondary does need to know about the Primary and thus registers itself as sharing the Primary DataStore. * Encode the blobs ids on the Secondary with the DataStore location/type with which we can distinguish the blob ids belonging to the respective DataStores. That’s a solution that only works in this very specific use case of CompositeDataStore. In the future if we were ever to want to support different scenarios we would then have to reconsider how it encodes blobs for each delegate. Would that mean that data written to a data store by the CompositeDataStore could not be read by another CompositeDataStore referencing the same delegate? * Secondary's Mark phase only redirects the Primary owned blobids to the references file in the Primary's DataStore (Primary's DataStore operating as Shared). The DataStore has no knowledge of the garbage collection stages. So IIUC this would require creating a new garbage collector that is aware of composite data stores and has the
[CompositeDataStore] pull request to review GC changes
Hi oak-dev, I’ve created a new pull request [0] to review changes I made to get garbage collection to work for the composite data store. I’d love some feedback. Since this entails a change to the MarkSweepGarbageCollector, we should discuss the change here to see if there are concerns with it or if there is a better approach. Let me try to briefly explain the change. In the use case I tested there are two Oak repositories, one which we will call primary and one which we will call secondary. Primary gets created first; secondary is created by cloning the node store of primary, then using a CompositeDataStore to have two delegate data stores. The first delegate is the same as the data store for primary, in read-only mode. The second delegate is only accessible by the secondary repo. Let the data store shared by primary and secondary be called DS_P and the data store being used only by the secondary be called DS_S. DS_P can be read by secondary but not modified, so all changes on secondary are saved in DS_S. Primary can still make changes to DS_P. Suppose after creating both repositories, records A and B are deleted from the primary repo, and records B and C are deleted from the secondary repo. Since DS_P is shared, only blob B should actually be deleted from DS_P via GC. After both repositories run their “mark” phase, the primary repo created a “references” file in DS_P excluding A and B, meaning primary thinks A and B can both be deleted. And the secondary repo created a “references” file in DS_P excluding B and C, meaning secondary thinks B and C can both be deleted. Suppose then primary runs the sweep phase first. It will first verify that it has a references file for each repository file in DS_P. Since both primary and secondary put one there this test passes. It will then merge all the data in all the references files in DS_P with its own local view of the existing blobs, and come up with a set of blobs to delete. Primary will conclude that blobs B and C should be deleted - B because both primary and secondary said it is deleted, and C because secondary said it should be deleted and primary has no knowledge of C so it will assume it is okay to delete. At this point primary will delete B and try to delete C and fail (which is ok). Then primary will delete its “references” file from DS_P and call the sweep phase complete. Now the problem comes when secondary tries to run the sweep phase. It will first try to verify that a references file exists for each repository file in DS_P - and fail. This fails because primary deleted its references file already. Thus secondary will cancel GC and thus blob C never ends up getting deleted. Note that secondary must delete C because it is the only repository that knows about C. This same situation exists also if secondary sweeps first. If record D was created by primary after secondary was cloned, then D is deleted by primary, secondary never knows about blob D so it cannot delete it during the sweep phase - it can only be deleted by primary. The change I made to the garbage collector is that when a repository finishes the sweep phase, it doesn’t necessarily delete the references file. Instead it marks the data store with a “sweepComplete” file indicating that this repository finished the sweep phase. When there is a “sweepComplete” file for every repository (in other words, the last repository to sweep), then all the references files are deleted. I wrote an integration test to test DSGC for this specific composite data store use case at [1]. All the oak unit tests pass with this change. I am concerned about any unforeseen consequences that others on-list may have about this change. Also there’s the issue that sweeping must now be done by every repository sharing the data store, which will have some inefficiencies. I’m open to changes or to a different approach if we can solve the problem described above still. 0 - https://github.com/apache/jackrabbit-oak/pull/80 1 - https://github.com/mattvryan/jackrabbit-oak/blob/39b33fe94a055ef588791f238eb85734c34062f3/oak-blob-composite/src/test/java/org/apache/jackrabbit/oak/blob/composite/CompositeDataStoreRORWIT.java Regards -MR