[
https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635802#comment-14635802
]
Vikas Saurabh edited comment on OAK-3070 at 7/21/15 8:57 PM:
-------------------------------------------------------------
Attaching [^OAK-3070.patch].
{{VersionGarbageCollectorTest#testGCDeletedDocument}} pretty fairly covers the
cases that version gc is working correctly.
The test case that I've added just asserts that {{gc()}} forms correct query to
underlying storage such that already processed documents aren't picked again.
I wanted to keep a tight bound on the lower bound according to the timestamp
used in the last run. But, I couldn't quite control virtual clock to generate a
doc with _modified same as the last timestamp used -- so, instead I've given a
margin of 1 minute to the lower bound (i.e. the lower bound is 1 minute less
that the upper bound of last gc run).
[~chetanm], [~mreutegg], can you please review?
was (Author: catholicon):
Attaching [^OAK-3070.patch.
{{VersionGarbageCollectorTest#testGCDeletedDocument}} pretty fairly covers the
cases that version gc is working correctly.
The test case that I've added just asserts that {{gc()}} forms correct query to
underlying storage such that already processed documents aren't picked again.
I wanted to keep a tight bound on the lower bound according to the timestamp
used in the last run. But, I couldn't quite control virtual clock to generate a
doc with _modified same as the last timestamp used -- so, instead I've given a
margin of 1 minute to the lower bound (i.e. the lower bound is 1 minute less
that the upper bound of last gc run).
[~chetanm], [~mreutegg], can you please review?
> Use a lower bound in VersionGC query to avoid checking unmodified once
> deleted docs
> -----------------------------------------------------------------------------------
>
> Key: OAK-3070
> URL: https://issues.apache.org/jira/browse/OAK-3070
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: mongomk, rdbmk
> Reporter: Chetan Mehrotra
> Fix For: 1.3.5
>
> Attachments: OAK-3070.patch
>
>
> As part of OAK-3062 [~mreutegg] suggested
> {quote}
> As a further optimization we could also limit the lower bound of the _modified
> range. The revision GC does not need to check documents with a _deletedOnce
> again if they were not modified after the last successful GC run. If they
> didn't change and were considered existing during the last run, then they
> must still exist in the current GC run. To make this work, we'd need to
> track the last successful revision GC run.
> {quote}
> Lowest last validated _modified can be possibly saved in settings collection
> and reused for next run
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)