[jira] [Updated] (OAK-3070) Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
[ https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcel Reutegger updated OAK-3070: -- Attachment: OAK-3070-3.patch Attached an updated patch [^OAK-3070-3.patch]. The MongoVersionGCSupport does not set a hint anymore for the query and leaves it up to MongoDB to decide what the best plan is. With the _modified range now passed to getPossiblyDeletedDocs(), the database is IMO in a better position to decide what the best index is. Regarding the query timeout, per default the query on the MongoDB driver level does not have a timeout. So we should be fine. > Use a lower bound in VersionGC query to avoid checking unmodified once > deleted docs > --- > > Key: OAK-3070 > URL: https://issues.apache.org/jira/browse/OAK-3070 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: mongomk, rdbmk >Reporter: Chetan Mehrotra >Assignee: Vikas Saurabh > Labels: performance > Attachments: OAK-3070-2.patch, OAK-3070-3.patch, OAK-3070.patch, > OAK-3070-updated.patch, OAK-3070-updated.patch > > > As part of OAK-3062 [~mreutegg] suggested > {quote} > As a further optimization we could also limit the lower bound of the _modified > range. The revision GC does not need to check documents with a _deletedOnce > again if they were not modified after the last successful GC run. If they > didn't change and were considered existing during the last run, then they > must still exist in the current GC run. To make this work, we'd need to > track the last successful revision GC run. > {quote} > Lowest last validated _modified can be possibly saved in settings collection > and reused for next run -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (OAK-3070) Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
[ https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcel Reutegger updated OAK-3070: -- Attachment: OAK-3070-2.patch Attached an updated patch [^OAK-3070-2.patch] with additional tests, unified implementations for VersionGCSupport and no margin for the stored lower bound of the next VersionGC. > Use a lower bound in VersionGC query to avoid checking unmodified once > deleted docs > --- > > Key: OAK-3070 > URL: https://issues.apache.org/jira/browse/OAK-3070 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: mongomk, rdbmk >Reporter: Chetan Mehrotra >Assignee: Vikas Saurabh > Labels: performance > Attachments: OAK-3070-2.patch, OAK-3070.patch, > OAK-3070-updated.patch, OAK-3070-updated.patch > > > As part of OAK-3062 [~mreutegg] suggested > {quote} > As a further optimization we could also limit the lower bound of the _modified > range. The revision GC does not need to check documents with a _deletedOnce > again if they were not modified after the last successful GC run. If they > didn't change and were considered existing during the last run, then they > must still exist in the current GC run. To make this work, we'd need to > track the last successful revision GC run. > {quote} > Lowest last validated _modified can be possibly saved in settings collection > and reused for next run -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (OAK-3070) Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
[ https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Reschke updated OAK-3070: Attachment: OAK-3070-updated.patch patch update to apply to trunk (also logging slightly extended) > Use a lower bound in VersionGC query to avoid checking unmodified once > deleted docs > --- > > Key: OAK-3070 > URL: https://issues.apache.org/jira/browse/OAK-3070 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: mongomk, rdbmk >Reporter: Chetan Mehrotra >Assignee: Vikas Saurabh > Labels: performance > Attachments: OAK-3070.patch, OAK-3070-updated.patch, > OAK-3070-updated.patch > > > As part of OAK-3062 [~mreutegg] suggested > {quote} > As a further optimization we could also limit the lower bound of the _modified > range. The revision GC does not need to check documents with a _deletedOnce > again if they were not modified after the last successful GC run. If they > didn't change and were considered existing during the last run, then they > must still exist in the current GC run. To make this work, we'd need to > track the last successful revision GC run. > {quote} > Lowest last validated _modified can be possibly saved in settings collection > and reused for next run -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (OAK-3070) Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
[ https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Reschke updated OAK-3070: Attachment: OAK-3070-updated.patch patch update to apply to trunk > Use a lower bound in VersionGC query to avoid checking unmodified once > deleted docs > --- > > Key: OAK-3070 > URL: https://issues.apache.org/jira/browse/OAK-3070 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: mongomk, rdbmk >Reporter: Chetan Mehrotra >Assignee: Vikas Saurabh > Labels: performance > Attachments: OAK-3070.patch, OAK-3070-updated.patch > > > As part of OAK-3062 [~mreutegg] suggested > {quote} > As a further optimization we could also limit the lower bound of the _modified > range. The revision GC does not need to check documents with a _deletedOnce > again if they were not modified after the last successful GC run. If they > didn't change and were considered existing during the last run, then they > must still exist in the current GC run. To make this work, we'd need to > track the last successful revision GC run. > {quote} > Lowest last validated _modified can be possibly saved in settings collection > and reused for next run -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (OAK-3070) Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
[ https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Saurabh updated OAK-3070: --- Fix Version/s: (was: 1.4) > Use a lower bound in VersionGC query to avoid checking unmodified once > deleted docs > --- > > Key: OAK-3070 > URL: https://issues.apache.org/jira/browse/OAK-3070 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: mongomk, rdbmk >Reporter: Chetan Mehrotra >Assignee: Vikas Saurabh > Labels: performance > Attachments: OAK-3070.patch > > > As part of OAK-3062 [~mreutegg] suggested > {quote} > As a further optimization we could also limit the lower bound of the _modified > range. The revision GC does not need to check documents with a _deletedOnce > again if they were not modified after the last successful GC run. If they > didn't change and were considered existing during the last run, then they > must still exist in the current GC run. To make this work, we'd need to > track the last successful revision GC run. > {quote} > Lowest last validated _modified can be possibly saved in settings collection > and reused for next run -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3070) Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
[ https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcel Reutegger updated OAK-3070: -- Fix Version/s: (was: 1.3.9) 1.4 > Use a lower bound in VersionGC query to avoid checking unmodified once > deleted docs > --- > > Key: OAK-3070 > URL: https://issues.apache.org/jira/browse/OAK-3070 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: mongomk, rdbmk >Reporter: Chetan Mehrotra >Assignee: Vikas Saurabh > Labels: performance > Fix For: 1.4 > > Attachments: OAK-3070.patch > > > As part of OAK-3062 [~mreutegg] suggested > {quote} > As a further optimization we could also limit the lower bound of the _modified > range. The revision GC does not need to check documents with a _deletedOnce > again if they were not modified after the last successful GC run. If they > didn't change and were considered existing during the last run, then they > must still exist in the current GC run. To make this work, we'd need to > track the last successful revision GC run. > {quote} > Lowest last validated _modified can be possibly saved in settings collection > and reused for next run -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3070) Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
[ https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcel Reutegger updated OAK-3070: -- Fix Version/s: (was: 1.3.6) 1.3.7 > Use a lower bound in VersionGC query to avoid checking unmodified once > deleted docs > --- > > Key: OAK-3070 > URL: https://issues.apache.org/jira/browse/OAK-3070 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: mongomk, rdbmk >Reporter: Chetan Mehrotra >Assignee: Vikas Saurabh > Labels: performance > Fix For: 1.3.7 > > Attachments: OAK-3070.patch > > > As part of OAK-3062 [~mreutegg] suggested > {quote} > As a further optimization we could also limit the lower bound of the _modified > range. The revision GC does not need to check documents with a _deletedOnce > again if they were not modified after the last successful GC run. If they > didn't change and were considered existing during the last run, then they > must still exist in the current GC run. To make this work, we'd need to > track the last successful revision GC run. > {quote} > Lowest last validated _modified can be possibly saved in settings collection > and reused for next run -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3070) Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
[ https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcel Reutegger updated OAK-3070: -- Labels: performance (was: performance resilience) > Use a lower bound in VersionGC query to avoid checking unmodified once > deleted docs > --- > > Key: OAK-3070 > URL: https://issues.apache.org/jira/browse/OAK-3070 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: mongomk, rdbmk >Reporter: Chetan Mehrotra >Assignee: Vikas Saurabh > Labels: performance > Fix For: 1.3.6 > > Attachments: OAK-3070.patch > > > As part of OAK-3062 [~mreutegg] suggested > {quote} > As a further optimization we could also limit the lower bound of the _modified > range. The revision GC does not need to check documents with a _deletedOnce > again if they were not modified after the last successful GC run. If they > didn't change and were considered existing during the last run, then they > must still exist in the current GC run. To make this work, we'd need to > track the last successful revision GC run. > {quote} > Lowest last validated _modified can be possibly saved in settings collection > and reused for next run -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3070) Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
[ https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3070: --- Fix Version/s: (was: 1.3.5) 1.3.6 > Use a lower bound in VersionGC query to avoid checking unmodified once > deleted docs > --- > > Key: OAK-3070 > URL: https://issues.apache.org/jira/browse/OAK-3070 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: mongomk, rdbmk >Reporter: Chetan Mehrotra >Assignee: Vikas Saurabh > Labels: performance, resilience > Fix For: 1.3.6 > > Attachments: OAK-3070.patch > > > As part of OAK-3062 [~mreutegg] suggested > {quote} > As a further optimization we could also limit the lower bound of the _modified > range. The revision GC does not need to check documents with a _deletedOnce > again if they were not modified after the last successful GC run. If they > didn't change and were considered existing during the last run, then they > must still exist in the current GC run. To make this work, we'd need to > track the last successful revision GC run. > {quote} > Lowest last validated _modified can be possibly saved in settings collection > and reused for next run -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3070) Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
[ https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Marth updated OAK-3070: --- Labels: performance resilience (was: ) > Use a lower bound in VersionGC query to avoid checking unmodified once > deleted docs > --- > > Key: OAK-3070 > URL: https://issues.apache.org/jira/browse/OAK-3070 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: mongomk, rdbmk >Reporter: Chetan Mehrotra >Assignee: Vikas Saurabh > Labels: performance, resilience > Fix For: 1.3.5 > > Attachments: OAK-3070.patch > > > As part of OAK-3062 [~mreutegg] suggested > {quote} > As a further optimization we could also limit the lower bound of the _modified > range. The revision GC does not need to check documents with a _deletedOnce > again if they were not modified after the last successful GC run. If they > didn't change and were considered existing during the last run, then they > must still exist in the current GC run. To make this work, we'd need to > track the last successful revision GC run. > {quote} > Lowest last validated _modified can be possibly saved in settings collection > and reused for next run -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3070) Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
[ https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Mehrotra updated OAK-3070: - Assignee: Vikas Saurabh > Use a lower bound in VersionGC query to avoid checking unmodified once > deleted docs > --- > > Key: OAK-3070 > URL: https://issues.apache.org/jira/browse/OAK-3070 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: mongomk, rdbmk >Reporter: Chetan Mehrotra >Assignee: Vikas Saurabh > Fix For: 1.3.5 > > Attachments: OAK-3070.patch > > > As part of OAK-3062 [~mreutegg] suggested > {quote} > As a further optimization we could also limit the lower bound of the _modified > range. The revision GC does not need to check documents with a _deletedOnce > again if they were not modified after the last successful GC run. If they > didn't change and were considered existing during the last run, then they > must still exist in the current GC run. To make this work, we'd need to > track the last successful revision GC run. > {quote} > Lowest last validated _modified can be possibly saved in settings collection > and reused for next run -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-3070) Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
[ https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Saurabh updated OAK-3070: --- Attachment: OAK-3070.patch Attaching [^OAK-3070.patch. {{VersionGarbageCollectorTest#testGCDeletedDocument}} pretty fairly covers the cases that version gc is working correctly. The test case that I've added just asserts that {{gc()}} forms correct query to underlying storage such that already processed documents aren't picked again. I wanted to keep a tight bound on the lower bound according to the timestamp used in the last run. But, I couldn't quite control virtual clock to generate a doc with _modified same as the last timestamp used -- so, instead I've given a margin of 1 minute to the lower bound (i.e. the lower bound is 1 minute less that the upper bound of last gc run). [~chetanm], [~mreutegg], can you please review? > Use a lower bound in VersionGC query to avoid checking unmodified once > deleted docs > --- > > Key: OAK-3070 > URL: https://issues.apache.org/jira/browse/OAK-3070 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: mongomk, rdbmk >Reporter: Chetan Mehrotra > Fix For: 1.3.5 > > Attachments: OAK-3070.patch > > > As part of OAK-3062 [~mreutegg] suggested > {quote} > As a further optimization we could also limit the lower bound of the _modified > range. The revision GC does not need to check documents with a _deletedOnce > again if they were not modified after the last successful GC run. If they > didn't change and were considered existing during the last run, then they > must still exist in the current GC run. To make this work, we'd need to > track the last successful revision GC run. > {quote} > Lowest last validated _modified can be possibly saved in settings collection > and reused for next run -- This message was sent by Atlassian JIRA (v6.3.4#6332)