[ https://issues.apache.org/jira/browse/OAK-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928304#comment-15928304 ]
Marcel Reutegger commented on OAK-4780: --------------------------------------- This looks very promising. I'd like to include those changes step by step. That is, first the VersionGC part in oak-core and in a second step the new run mode for oak-run. I would even prefer if the second part goes into a separate issue. Regarding your github branch. It contains a 'patches' directory with two diffs. What are those changes? Some more comments: - VersionGarbageCollector.reset() can be simplified with just the remove() call. It will be a noop if the document doesn't exist. - Can you please add tests for TimeInterval? - Did you consider moving the new set methods on VersionGarbageCollector to a new class (e.g. VersionGCOptions) and pass it as an argument to gc()? I think with the current patch it is possible to influence a running GC by calling one of those set methods. - What is the TODO about in VersionGCStats.addRun()? - Usage of LimitExceededException from javax.naming is a big funky ;) but I guess you didn't want to invent yet another exception class - VersionGarbageCollector.delayOnModification() should use Clock.waitUntil(). This allows to write efficient tests with a virtual clock. - Only minor: the diff for VersionGarbageCollector also contains a couple of indentation changes for anonymous inner classes, which are unrelated to this improvement. - In MongoVersionGCSupport.getDeletedOnceCount(): {{ReadPreference.nearest().secondaryPreferred()}}. You cannot have both nearest and secondaryPreferred. The class will always give you a secondaryPreferred ReadPreference. - Minor: some unused imports in VersionGCSupport > VersionGarbageCollector should be able to run incrementally > ----------------------------------------------------------- > > Key: OAK-4780 > URL: https://issues.apache.org/jira/browse/OAK-4780 > Project: Jackrabbit Oak > Issue Type: Task > Components: core, documentmk > Reporter: Julian Reschke > Attachments: leafnodes.diff, leafnodes-v2.diff, leafnodes-v3.diff > > > Right now, the documentmk's version garbage collection runs in several phases. > It first collects the paths of candidate nodes, and only once this has been > successfully finished, starts actually deleting nodes. > This can be a problem when the regularly scheduled garbage collection is > interrupted during the path collection phase, maybe due to other maintenance > tasks. On the next run, the number of paths to be collected will be even > bigger, thus making it even more likely to fail. > We should think about a change in the logic that would allow the GC to run in > chunks; maybe by partitioning the path space by top level directory. -- This message was sent by Atlassian JIRA (v6.3.15#6346)