[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15679339#comment-15679339 ] Julian Reschke commented on OAK-3001: - trunk: [r1770003|http://svn.apache.org/r1770003] [r1769930|http://svn.apache.org/r1769930] [r1769922|http://svn.apache.org/r1769922] [r1692065|http://svn.apache.org/r1692065] 1.4: [r1692065|http://svn.apache.org/r1692065] 1.2: [r1692066|http://svn.apache.org/r1692066] > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Assignee: Vikas Saurabh >Priority: Critical > Labels: scalability > Fix For: 1.6, 1.5.14 > > Attachments: OAK-3001.take1.patch > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670778#comment-15670778 ] Vikas Saurabh commented on OAK-3001: Done (OAK-5119). > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Assignee: Vikas Saurabh >Priority: Critical > Labels: scalability > Fix For: 1.6, 1.5.14 > > Attachments: OAK-3001.take1.patch > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670200#comment-15670200 ] Marcel Reutegger commented on OAK-3001: --- That's better, yes. > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Assignee: Vikas Saurabh >Priority: Critical > Labels: scalability > Fix For: 1.6, 1.5.14 > > Attachments: OAK-3001.take1.patch > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670192#comment-15670192 ] Vikas Saurabh commented on OAK-3001: [~mreutegg], oh!, i meant if I need to update this code to utilize (extended) condition class. Your comment clarifies what I was seeing. > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Assignee: Vikas Saurabh >Priority: Critical > Labels: scalability > Fix For: 1.6, 1.5.14 > > Attachments: OAK-3001.take1.patch > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670187#comment-15670187 ] Vikas Saurabh commented on OAK-3001: [~mreutegg], I should've clarified my statement better. What I meant was that the test was leaving a scope to skip the edge ceck (and I "think" that travis somehow had hit that... although I can't formulate any logic for "why" with VirtualClock in place). bq. you added more Yes, I actually want to just do a hard check for cp-head bq. I'm also not sure we should add it. It assumes a journal entry is written on DocumentNodeStore init. We can do a fake commit, storeHead, runBkOps before creating a check-point - that should create a journal entry and give us head at checkpoint. That should be ok, right? > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Assignee: Vikas Saurabh >Priority: Critical > Labels: scalability > Fix For: 1.6, 1.5.14 > > Attachments: OAK-3001.take1.patch > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670134#comment-15670134 ] Stefan Egli commented on OAK-3001: -- bq. can you please review if I wavered off from what we planned to do? lgtm (reviewed API and usage) > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Assignee: Vikas Saurabh >Priority: Critical > Labels: scalability > Fix For: 1.6, 1.5.14 > > Attachments: OAK-3001.take1.patch > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670100#comment-15670100 ] Stefan Egli commented on OAK-3001: -- bq. noticed OAK-3975 in related issues. I think we should revert that configuration and code for house-keeping/maintenance!? +1 > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Assignee: Vikas Saurabh >Priority: Critical > Labels: scalability > Fix For: 1.6, 1.5.14 > > Attachments: OAK-3001.take1.patch > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669900#comment-15669900 ] Marcel Reutegger commented on OAK-3001: --- bq. I didn't extend Condition class for this as afaics usage of condition seemed to be somehow linked with id (I'm not completely sure of the intended contract). May be that bit needs correction/refactor. I think we can keep {{Condition}} as is but add clarifications to JavaDoc if needed. Conditions are tied to an UpdateOp, which in turn in linked with an ID. You can construct an independent Condition, but its usage in the DocumentStore interface is always related to an UpdateOp (or id/key). > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Assignee: Vikas Saurabh >Priority: Critical > Labels: scalability > Fix For: 1.6, 1.5.14 > > Attachments: OAK-3001.take1.patch > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669885#comment-15669885 ] Marcel Reutegger commented on OAK-3001: --- AFAICS you didn't change any checks, you added more. The additional checks are OK, but go beyond the core issue this test is checking. I'm also not sure we should add it. It assumes a journal entry is written on DocumentNodeStore init. While that may be the case right now, it is probably not strictly necessary and could change in the future. > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Assignee: Vikas Saurabh >Priority: Critical > Labels: scalability > Fix For: 1.6, 1.5.14 > > Attachments: OAK-3001.take1.patch > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669642#comment-15669642 ] Vikas Saurabh commented on OAK-3001: [~mreutegg], travis build failed after my last commit [here|https://travis-ci.org/apache/jackrabbit-oak/builds/176271155]. While, I think I've fixed the issue, but it seems that {{JournalGCTest#gcWithCheckpoint}} isn't doing correct checks. I think we should update that test as: {noformat} diff --git a/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/JournalGCTest.java b/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/JournalGCTest.java index f518ab2..7c45cab 100644 --- a/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/JournalGCTest.java +++ b/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/JournalGCTest.java @@ -44,6 +44,8 @@ public class JournalGCTest { DocumentNodeStore ns = builderProvider.newBuilder() .clock(c).setAsyncDelay(0).getNodeStore(); +Revision cpHead = ns.getHeadRevision().getRevision(ns.getClusterId()); +assertNotNull(cpHead); String cp = ns.checkpoint(TimeUnit.DAYS.toMillis(1)); // perform some change NodeBuilder builder = ns.getRoot().builder(); @@ -55,7 +57,9 @@ public class JournalGCTest { // trigger creation of journal entry ns.runBackgroundOperations(); -JournalEntry entry = ns.getDocumentStore().find(JOURNAL, JournalEntry.asId(head)); +JournalEntry entry = ns.getDocumentStore().find(JOURNAL, JournalEntry.asId(cpHead)); +assertNotNull(entry); +entry = ns.getDocumentStore().find(JOURNAL, JournalEntry.asId(head)); assertNotNull(entry); // wait two hours @@ -65,6 +69,8 @@ public class JournalGCTest { ns.getJournalGarbageCollector().gc(1, TimeUnit.HOURS, 10); // must not remove existing entry, because checkpoint is still valid +entry = ns.getDocumentStore().find(JOURNAL, JournalEntry.asId(cpHead)); +assertNotNull(entry); entry = ns.getDocumentStore().find(JOURNAL, JournalEntry.asId(head)); assertNotNull(entry); @@ -72,6 +78,8 @@ public class JournalGCTest { ns.getJournalGarbageCollector().gc(1, TimeUnit.HOURS, 10); // now journal GC can remove the entry +entry = ns.getDocumentStore().find(JOURNAL, JournalEntry.asId(cpHead)); +assertNull(entry); entry = ns.getDocumentStore().find(JOURNAL, JournalEntry.asId(head)); assertNull(entry); } {noformat} > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Assignee: Vikas Saurabh >Priority: Critical > Labels: scalability > Fix For: 1.6, 1.5.14 > > Attachments: OAK-3001.take1.patch > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? >
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669636#comment-15669636 ] Vikas Saurabh commented on OAK-3001: Travis build failed [here|https://travis-ci.org/apache/jackrabbit-oak/builds/176271155] but the test({{JournalGCTest#gcWithCheckpoint}}) seemed to working fine for me locally. While investigating, I realized that I had done range check with inclusive bounds, while the earlier GC logic relied on query which had exclusive bound checks. Made this APIs bound as exlusive too at [r1769930|https://svn.apache.org/r1769930]. > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Assignee: Vikas Saurabh >Priority: Critical > Labels: scalability > Fix For: 1.6, 1.5.14 > > Attachments: OAK-3001.take1.patch > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669384#comment-15669384 ] Vikas Saurabh commented on OAK-3001: [~egli], just noticed OAK-3975 in related issues. I think we should revert that configuration and code for house-keeping/maintenance!? > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Assignee: Vikas Saurabh >Priority: Critical > Labels: scalability > Fix For: 1.6, 1.5.14 > > Attachments: OAK-3001.take1.patch > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669321#comment-15669321 ] Vikas Saurabh commented on OAK-3001: Committed the patch on trunk at [r1769922|https://svn.apache.org/r1769922]. Not resolving yet though. > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Assignee: Vikas Saurabh >Priority: Critical > Labels: scalability > Fix For: 1.6, 1.5.14 > > Attachments: OAK-3001.take1.patch > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669049#comment-15669049 ] Julian Reschke commented on OAK-3001: - I would say: go ahead, as long you have tests, and at least verify with h2 and derby (run mvn with -Prdb-derby). Then, just re-assign to me and I can test with the other DBs and maybe do refactoring inside RDBDocumentStore. > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Assignee: Vikas Saurabh >Priority: Critical > Labels: scalability > Fix For: 1.6, 1.5.14 > > Attachments: OAK-3001.take1.patch > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667061#comment-15667061 ] Stefan Egli commented on OAK-3001: -- sounds good to me > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Assignee: Vikas Saurabh >Priority: Critical > Labels: scalability > Fix For: 1.6, 1.5.14 > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667045#comment-15667045 ] Vikas Saurabh commented on OAK-3001: [~mreutegg], [~egli], {{JournalGarbageCollector#gc}} currently removes journal entry by looping for-each-cluster-node -> for-each-matching-_id-in-range. Given, we implement {{remove(collection, indexProp /\*_modified for this discussion\*/, startVal /\*current logic batches with 0 and loops.. we can probably do some flag to skip that condition in remove API\*/, endVal)}}, is it ok that we essentially remove the relatively big portion of the method (looping over cluster id and _id) and simply remove all entries matching _modified older than some age? > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Assignee: Vikas Saurabh >Priority: Critical > Labels: scalability > Fix For: 1.6, 1.5.14 > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633215#comment-15633215 ] Marcel Reutegger commented on OAK-3001: --- No, not necessarily. I think we can also implement this improvement with the sub-tasks defined on this issue. That is, modify the existing remove method to support a range condition or add a new variant. > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Priority: Critical > Labels: scalability > Fix For: 1.6 > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633201#comment-15633201 ] Stefan Egli commented on OAK-3001: -- [~mreutegg], I've seen you moved OAK-3213 to 1.8 - that would imply we're moving OAK-3001 to 1.8 as well, wdyt? > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Priority: Critical > Labels: scalability > Fix For: 1.6 > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140957#comment-15140957 ] Stefan Egli commented on OAK-3001: -- As discussed offline the decision is to not rush this into 1.4 unless necessary and instead implement OAK-3975 > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Priority: Critical > Labels: scalability > Fix For: 1.6 > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130249#comment-15130249 ] Julian Reschke commented on OAK-3001: - [~egli] would supporting a range for the indexed property fix *this* problem? In which case I'm +1 on doing this minor surgery (yet another DS method) if we will actually do the promised API work quickly after, and get rid of the old query signatures then. (Note that RDBDocumentStore already has the functionality in the meantime; it's just not available through a public API; we could special case with {{instanceof}} but then would run into issues with code that wraps DocumentStore implementations. > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Priority: Critical > Labels: scalability > Fix For: 1.6 > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130264#comment-15130264 ] Chetan Mehrotra commented on OAK-3001: -- We should aim for simply deleting from the backend without fetching the documents. So something like {code} int remove(Collection collection, String indexedProperty, long startValue, long endValue); {code} Which easily translates to DB and Mongo call. For this case we only need endValue > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Priority: Critical > Labels: scalability > Fix For: 1.6 > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130268#comment-15130268 ] Julian Reschke commented on OAK-3001: - +1 - let's do the simplest possible thing here to address the issue, but plan to get rid of the API as soon as we have something better (and make sure that "soon" is really soon and can get backported to most (all?) of the old branches) > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Priority: Critical > Labels: scalability > Fix For: 1.6 > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130260#comment-15130260 ] Stefan Egli commented on OAK-3001: -- bq. would supporting a range for the indexed property fix this problem? re-reading this ticket, I think what we need is either what [~mreutegg] suggested initially ("query method with a projection") or a fancier {{remove}} method which takes a query. The new problem around the block is that single journal entries can have a large {{_c}} property. Which means what you want to achieve is to not even having to read that {{_c}} (unless you avoid writing large {{_c}} values in the first place). So I don't think a range in a query fixes this.. > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Priority: Critical > Labels: scalability > Fix For: 1.6 > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14733695#comment-14733695 ] Stefan Egli commented on OAK-3001: -- OAK-3001 is blocked by OAK-3213 > Simplify JournalGarbageCollector using a dedicated timestamp property > - > > Key: OAK-3001 > URL: https://issues.apache.org/jira/browse/OAK-3001 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Critical > Labels: scalability > Fix For: 1.3.7, 1.2.6 > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] > from [~chetanm] re JournalGC: > {quote} > Further looking at JournalGarbageCollector ... it would be simpler if you > record the journal entry timestamp as an attribute in JournalEntry document > and then you can delete all the entries which are older than some time by a > simple query. This would avoid fetching all the entries to be deleted on the > Oak side > {quote} > and a corresponding > [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] > from myself: > {quote} > Re querying by timestamp: that would indeed be simpler. With the current set > of DocumentStore API however, I believe this is not possible. But: > [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] > comes quite close: it would probably just require the opposite of that > method too: > {code} > public List query(Collection collection, > String fromKey, > String toKey, > String indexedProperty, > long endValue, > int limit) { > {code} > .. or what about generalizing this method to have both a {{startValue}} and > an {{endValue}} - with {{-1}} indicating when one of them is not used? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645579#comment-14645579 ] Julian Reschke commented on OAK-3001: - I recommend going back to the API discussion on oak-dev; there are more things to consider, such as java based filtering and/or returning sparse results (which, for instance, have only certain system properties set) Simplify JournalGarbageCollector using a dedicated timestamp property - Key: OAK-3001 URL: https://issues.apache.org/jira/browse/OAK-3001 Project: Jackrabbit Oak Issue Type: Improvement Components: core, mongomk Reporter: Stefan Egli Assignee: Stefan Egli Priority: Critical Labels: scalability Fix For: 1.2.4, 1.3.4 This subtask is about spawning out a [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] from [~chetanm] re JournalGC: {quote} Further looking at JournalGarbageCollector ... it would be simpler if you record the journal entry timestamp as an attribute in JournalEntry document and then you can delete all the entries which are older than some time by a simple query. This would avoid fetching all the entries to be deleted on the Oak side {quote} and a corresponding [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] from myself: {quote} Re querying by timestamp: that would indeed be simpler. With the current set of DocumentStore API however, I believe this is not possible. But: [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] comes quite close: it would probably just require the opposite of that method too: {code} public T extends Document ListT query(CollectionT collection, String fromKey, String toKey, String indexedProperty, long endValue, int limit) { {code} .. or what about generalizing this method to have both a {{startValue}} and an {{endValue}} - with {{-1}} indicating when one of them is not used? {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642871#comment-14642871 ] Stefan Egli commented on OAK-3001: -- [~mreutegg], [~reschke], [~chetanm], how should we proceed here. To me the simplest change would be to change the current {{query}} method to look as follows: {code} public T extends Document ListT query(CollectionT collection, String fromKey, String toKey, String indexedProperty, long startValue, long endValue, int limit) { {code} where {{startValue==Long.MIN_VALUE}} and/or {{endValue==Long.MAX_VALUE}} could be used to basically disable that part of the condition. Simplify JournalGarbageCollector using a dedicated timestamp property - Key: OAK-3001 URL: https://issues.apache.org/jira/browse/OAK-3001 Project: Jackrabbit Oak Issue Type: Improvement Components: core, mongomk Reporter: Stefan Egli Assignee: Stefan Egli Priority: Critical Labels: scalability Fix For: 1.2.4, 1.3.4 This subtask is about spawning out a [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] from [~chetanm] re JournalGC: {quote} Further looking at JournalGarbageCollector ... it would be simpler if you record the journal entry timestamp as an attribute in JournalEntry document and then you can delete all the entries which are older than some time by a simple query. This would avoid fetching all the entries to be deleted on the Oak side {quote} and a corresponding [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] from myself: {quote} Re querying by timestamp: that would indeed be simpler. With the current set of DocumentStore API however, I believe this is not possible. But: [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] comes quite close: it would probably just require the opposite of that method too: {code} public T extends Document ListT query(CollectionT collection, String fromKey, String toKey, String indexedProperty, long endValue, int limit) { {code} .. or what about generalizing this method to have both a {{startValue}} and an {{endValue}} - with {{-1}} indicating when one of them is not used? {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609836#comment-14609836 ] Marcel Reutegger commented on OAK-3001: --- I converted this issue from a sub-task of OAK-2829 into a separate improvement. I think this is an important optimization, but shouldn't block OAK-2829. Simplify JournalGarbageCollector using a dedicated timestamp property - Key: OAK-3001 URL: https://issues.apache.org/jira/browse/OAK-3001 Project: Jackrabbit Oak Issue Type: Improvement Components: core, mongomk Reporter: Stefan Egli Priority: Critical Labels: scalability Fix For: 1.2.3, 1.3.2 This subtask is about spawning out a [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] from [~chetanm] re JournalGC: {quote} Further looking at JournalGarbageCollector ... it would be simpler if you record the journal entry timestamp as an attribute in JournalEntry document and then you can delete all the entries which are older than some time by a simple query. This would avoid fetching all the entries to be deleted on the Oak side {quote} and a corresponding [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] from myself: {quote} Re querying by timestamp: that would indeed be simpler. With the current set of DocumentStore API however, I believe this is not possible. But: [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] comes quite close: it would probably just require the opposite of that method too: {code} public T extends Document ListT query(CollectionT collection, String fromKey, String toKey, String indexedProperty, long endValue, int limit) { {code} .. or what about generalizing this method to have both a {{startValue}} and an {{endValue}} - with {{-1}} indicating when one of them is not used? {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595911#comment-14595911 ] Marcel Reutegger commented on OAK-3001: --- I'm a bit worried about adding more complex methods to the DocumentStore interface, but in general it makes sense to avoid loading all the data just to remove it quickly after. A while back [~reschke] and I discussed a query method with a projection. This way we would at least not read all the data, but just the {{_id}}s. Back then it was in the context of garbage collection where we have a similar requirement. Simplify JournalGarbageCollector using a dedicated timestamp property - Key: OAK-3001 URL: https://issues.apache.org/jira/browse/OAK-3001 Project: Jackrabbit Oak Issue Type: Sub-task Components: core, mongomk Reporter: Stefan Egli Priority: Critical Labels: scalability Fix For: 1.2.3, 1.3.2 This subtask is about spawning out a [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] from [~chetanm] re JournalGC: {quote} Further looking at JournalGarbageCollector ... it would be simpler if you record the journal entry timestamp as an attribute in JournalEntry document and then you can delete all the entries which are older than some time by a simple query. This would avoid fetching all the entries to be deleted on the Oak side {quote} and a corresponding [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] from myself: {quote} Re querying by timestamp: that would indeed be simpler. With the current set of DocumentStore API however, I believe this is not possible. But: [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] comes quite close: it would probably just require the opposite of that method too: {code} public T extends Document ListT query(CollectionT collection, String fromKey, String toKey, String indexedProperty, long endValue, int limit) { {code} .. or what about generalizing this method to have both a {{startValue}} and an {{endValue}} - with {{-1}} indicating when one of them is not used? {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property
[ https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595891#comment-14595891 ] Stefan Egli commented on OAK-3001: -- [~chetanm], [~mreutegg], wdyt? Simplify JournalGarbageCollector using a dedicated timestamp property - Key: OAK-3001 URL: https://issues.apache.org/jira/browse/OAK-3001 Project: Jackrabbit Oak Issue Type: Sub-task Components: core, mongomk Reporter: Stefan Egli Priority: Critical Labels: scalability Fix For: 1.2.3, 1.3.2 This subtask is about spawning out a [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733] from [~chetanm] re JournalGC: {quote} Further looking at JournalGarbageCollector ... it would be simpler if you record the journal entry timestamp as an attribute in JournalEntry document and then you can delete all the entries which are older than some time by a simple query. This would avoid fetching all the entries to be deleted on the Oak side {quote} and a corresponding [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870] from myself: {quote} Re querying by timestamp: that would indeed be simpler. With the current set of DocumentStore API however, I believe this is not possible. But: [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127] comes quite close: it would probably just require the opposite of that method too: {code} public T extends Document ListT query(CollectionT collection, String fromKey, String toKey, String indexedProperty, long endValue, int limit) { {code} .. or what about generalizing this method to have both a {{startValue}} and an {{endValue}} - with {{-1}} indicating when one of them is not used? {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)