[ https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15899605#comment-15899605 ]
Marcel Reutegger commented on OAK-3070: --------------------------------------- I think the margin was introduced because of how {{VersionGCSupport.getPossiblyDeletedDocs()}} compares the two timestamps. With the patch, the garbage collector may miss some documents. Consider the following GC runs with the patch: Initially {{getPossiblyDeletedDocs()}} will return {{0 > getModifiedInSecs(doc) <= t1}}. In the subsequent run it will return {{t1 > getModifiedInSecs(doc) <= t2}}. There may be documents modified after t1 that still fall into the same 5 second resolution bucket as t1. The second run will not match them. I'll update the issue with a new patch... > Use a lower bound in VersionGC query to avoid checking unmodified once > deleted docs > ----------------------------------------------------------------------------------- > > Key: OAK-3070 > URL: https://issues.apache.org/jira/browse/OAK-3070 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: mongomk, rdbmk > Reporter: Chetan Mehrotra > Assignee: Vikas Saurabh > Labels: performance > Attachments: OAK-3070.patch, OAK-3070-updated.patch, > OAK-3070-updated.patch > > > As part of OAK-3062 [~mreutegg] suggested > {quote} > As a further optimization we could also limit the lower bound of the _modified > range. The revision GC does not need to check documents with a _deletedOnce > again if they were not modified after the last successful GC run. If they > didn't change and were considered existing during the last run, then they > must still exist in the current GC run. To make this work, we'd need to > track the last successful revision GC run. > {quote} > Lowest last validated _modified can be possibly saved in settings collection > and reused for next run -- This message was sent by Atlassian JIRA (v6.3.15#6346)