[jira] [Comment Edited] (OAK-4780) VersionGarbageCollector should be able to run incrementally

Marcel Reutegger (JIRA) Thu, 16 Mar 2017 08:47:01 -0700

    [ 
https://issues.apache.org/jira/browse/OAK-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928304#comment-15928304
 ]


Marcel Reutegger edited comment on OAK-4780 at 3/16/17 3:46 PM:
----------------------------------------------------------------

This looks very promising. I'd like to include those changes step by step. That 
is, first the VersionGC part in oak-core and in a second step the new run mode 
for oak-run. I would even prefer if the second part goes into a separate issue.

Regarding your github branch. It contains a 'patches' directory with two diffs. 
What are those changes?

Some more comments:

- VersionGarbageCollector.reset() can be simplified with just the remove() 
call. It will be a noop if the document doesn't exist. 
- Can you please add tests for TimeInterval?
- Did you consider moving the new set methods on VersionGarbageCollector to a 
new class (e.g. VersionGCOptions) and pass it as an argument to gc()? I think 
with the current patch it is possible to influence a running GC by calling one 
of those set methods.
- What is the TODO about in VersionGCStats.addRun()?
- Usage of LimitExceededException from javax.naming is a bit funky ;) but I 
guess you didn't want to invent yet another exception class
- VersionGarbageCollector.delayOnModification() should use Clock.waitUntil(). 
This allows to write efficient tests with a virtual clock.
- Only minor: the diff for VersionGarbageCollector also contains a couple of 
indentation changes for anonymous inner classes, which are unrelated to this 
improvement.
- In MongoVersionGCSupport.getDeletedOnceCount(): 
{{ReadPreference.nearest().secondaryPreferred()}}. You cannot have both nearest 
and secondaryPreferred. The class will always give you a secondaryPreferred 
ReadPreference.
- Minor: some unused imports in VersionGCSupport


was (Author: mreutegg):
This looks very promising. I'd like to include those changes step by step. That 
is, first the VersionGC part in oak-core and in a second step the new run mode 
for oak-run. I would even prefer if the second part goes into a separate issue.

Regarding your github branch. It contains a 'patches' directory with two diffs. 
What are those changes?

Some more comments:

- VersionGarbageCollector.reset() can be simplified with just the remove() 
call. It will be a noop if the document doesn't exist. 
- Can you please add tests for TimeInterval?
- Did you consider moving the new set methods on VersionGarbageCollector to a 
new class (e.g. VersionGCOptions) and pass it as an argument to gc()? I think 
with the current patch it is possible to influence a running GC by calling one 
of those set methods.
- What is the TODO about in VersionGCStats.addRun()?
- Usage of LimitExceededException from javax.naming is a big funky ;) but I 
guess you didn't want to invent yet another exception class
- VersionGarbageCollector.delayOnModification() should use Clock.waitUntil(). 
This allows to write efficient tests with a virtual clock.
- Only minor: the diff for VersionGarbageCollector also contains a couple of 
indentation changes for anonymous inner classes, which are unrelated to this 
improvement.
- In MongoVersionGCSupport.getDeletedOnceCount(): 
{{ReadPreference.nearest().secondaryPreferred()}}. You cannot have both nearest 
and secondaryPreferred. The class will always give you a secondaryPreferred 
ReadPreference.
- Minor: some unused imports in VersionGCSupport

> VersionGarbageCollector should be able to run incrementally
> -----------------------------------------------------------
>
>                 Key: OAK-4780
>                 URL: https://issues.apache.org/jira/browse/OAK-4780
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: core, documentmk
>            Reporter: Julian Reschke
>         Attachments: leafnodes.diff, leafnodes-v2.diff, leafnodes-v3.diff
>
>
> Right now, the documentmk's version garbage collection runs in several phases.
> It first collects the paths of candidate nodes, and only once this has been 
> successfully finished, starts actually deleting nodes.
> This can be a problem when the regularly scheduled garbage collection is 
> interrupted during the path collection phase, maybe due to other maintenance 
> tasks. On the next run, the number of paths to be collected will be even 
> bigger, thus making it even more likely to fail.
> We should think about a change in the logic that would allow the GC to run in 
> chunks; maybe by partitioning the path space by top level directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (OAK-4780) VersionGarbageCollector should be able to run incrementally

Reply via email to