Hi, Could you please move this discussion to the relvant jira issue.
Thanks Amit On Thu, Feb 22, 2018 at 1:32 PM, Matt Ryan <o...@mvryan.org> wrote: > Hi, > > > On February 21, 2018 at 10:32:48 PM, Amit Jain (am...@ieee.org) wrote: > > Hi, > > > > Now the problem comes when secondary tries to run the sweep phase. It > will > > first try to verify that a references file exists for each repository > file > > in DS_P - and fail. This fails because primary deleted its references > file > > already. Thus secondary will cancel GC and thus blob C never ends up > > getting deleted. Note that secondary must delete C because it is the > only > > repository that knows about C. > > > > This same situation exists also if secondary sweeps first. If record D > was > > created by primary after secondary was cloned, then D is deleted by > > primary, secondary never knows about blob D so it cannot delete it > during > > the sweep phase - it can only be deleted by primary. > > > > The solution for Shared DataStore currently is to require all repositories > to run a Mark phase then run the Sweep phase on one of them. > > > Yes. Sorry, I didn’t mention that. I was trying to be brief and ended up > being unclear. In the situation I described above it is definitely running > the mark phase first and then the sweep phase. The problem is still as I > described - no matter which one runs sweep first, it cannot delete all the > binaries that may possibly have been deleted on both systems. > The problem is because that's how the systems are set up. For this particular problem on the Secondary there is no reason to even account for the Primary's datastore as it should not and cannot delete anything in there. > > > > > The change I made to the garbage collector is that when a repository > > finishes the sweep phase, it doesn’t necessarily delete the references > > file. Instead it marks the data store with a “sweepComplete” file > > indicating that this repository finished the sweep phase. When there is a > > > “sweepComplete” file for every repository (in other words, the last > > repository to sweep), then all the references files are deleted. > > > > Well currently the problem is that all repositories are not required to > run the sweep phase. The solution above would have been ok when the GC is > to be run manually at different times as in your case. > > Exactly - in the case I’ve described both have to successfully run a sweep > or not all binaries will be deleted. > > > But in the real world applications typically there's a cron (e.g. AEM > maintenance task) which could be setup to execute weekly at a particular > time on all repositories. In this case in almost all cases the repository > which finished the Mark phase at the last would only be able to execute the > > Sweep phase as it would be the only repository to see all the reference > files for other repos (others executing before it would fail). This is > still Ok for the Shared DataStore use cases we have. > But with the above solution since not all repositories would be able to run > > the sweep phase the reference files won't be cleaned up. > > A very valid point. I’ll need to think that one through some more. > > > > Besides there's a problem of the Sweep phase on the primary encountering > blobs it does not know about (from the secondary) and which it cannot > delete creating an unpleasant experience. As I understand the Primary could > > be a production system and having these sort of errors crop up would be > problematic. > > If they are regarded as errors, yes. Currently this logs a WARN level > message (not an ERROR) which suggests that sometimes not all the binaries > targeted for deletion will actually be deleted. > > So this might be an issue of setting clear expectations. But I do see the > point. > Yes these are logged as WARN as these are not fatal and empirically these are problematic and is questioned by customers. But apart from that there is a performance impact also as each binary is attempted for deletion which incurs a penalty. > > > So, generically the solution would be to use the shared DataStore GC > paradigm we currently have which requires Mark phase to be run on all > repositories before running a Sweep. > > Yes - like I said this is being done, it still requires that both repos do > a sweep. > > > > For this specific use case some observations and quick rough sketch of a > possible solution: > * The DataStores for the 2 repositories - Primary & Secondary can be > thought of as Shared & Private > ** Primary does not know about Secondary and could be an existing > repository and thus does not know about the DataStore of the Secondary as > well. In other words it could even function as a normal DataStore and need > > not be a CompositeDataStore. > > ** Secondary does need to know about the Primary and thus registers itself > > as sharing the Primary DataStore. > * Encode the blobs ids on the Secondary with the DataStore location/type > with which we can distinguish the blob ids belonging to the respective > DataStores. > > That’s a solution that only works in this very specific use case of > CompositeDataStore. In the future if we were ever to want to support > different scenarios we would then have to reconsider how it encodes blobs > for each delegate. Would that mean that data written to a data store by > the CompositeDataStore could not be read by another CompositeDataStore > referencing the same delegate? > But encoding of blob ids is needed anyways irrespective of the GC no? Otherwise, how does the CompositeDataStore redirect the calls to CRUD on the respective DSs? And did not understand how encoding the blob id with information about the DS preclude it from reading. It has to have the same semantics for the same delegate. But yes it does preclude moving the blobs from one subspace to another. But I don't think that's the use case anyways. > * Secondary's Mark phase only redirects the Primary owned blobids to the > references file in the Primary's DataStore (Primary's DataStore operating > as Shared). > > The DataStore has no knowledge of the garbage collection stages. So IIUC > this would require creating a new garbage collector that is aware of > composite data stores and has the ability to interact with the > CompositeDataStore in a tightly coupled fashion. Either that or we would > have to enhance the data store API (for example, add a new interface or > extend an interface so it can be precisely controlled by the garbage > collector). Or both. > > > DataStore does not have knowledge when GC is taking place. But it does have helper methods which are used by GC. Yes I would think that the methods currently existing for purpose of GC need to be enhanced and the Composite would have some intelligence on execution of some methods for e.g. delete and the metadata methods with some information about the delegates. * Secondary executes GC for its DataStore independently and does not worry > about the Shared blobids (already taken care of above). > > Same issue - GC happens outside of the control of the DataStore. > > It’s a good idea Amit - something I struggled with quite a while. I > considered the same approach as well. But it tightly binds garbage > collection to the data store, whereas now they are currently very loosely > bound. GC leverages the DataStore APIs to do GC tasks (like reading and > writing metadata files) but the DataStore doesn’t have any knowledge that > GC is even happening. > > So i don’t see how the CompositeDataStore could control execution of GC > only on the independent data store. > It does not control execution of the GC but it does control the GC helper methods and uses info already available with it for the delegates. Also, we could simply have GC instances bound to each delegate DataStore. This also would be similar to a case where we use the CompositeDataStore for internally creating a separate lucene DataStore. > Furthermore, future uses of CompositeDataStore might not be so clear-cut. > A CompositeDataStore might have 5 delegates, some of which are shared, some > are not, some are read-only, some are not. How would it know which ones to > GC independently and which ones to do shared? > Shared is sort of automatic setup currently on startup based on a unique clusterid/repositoryid. For Shared once it sees different repository ids registered it only proceeds with Mark phase once all references from all registered repositories are available. > I think it is better to leave the GC logic where it is and let the > DataStore (and the CompositeDataStore) remain unaware of GC logic, if > possible. > > > I’m confident the solution I proposed works correctly, in testing. I > understand there are undesirable consequences. I also get the point you > made, a very good one, which is that this is unlikely to work well in the > real world due to how production systems function. > It might be Ok for this particular problem. But this does not work in the existing DataStore setups as I outlined previously. > What else could we do to address this? > I think we need a better solution on this. But we could do either below to proceed currently: * Add this sweep state change thing only in case of CompositeDataStore Or * For each repository ** Execute Mark on all repositories/ Sweep > -MR >