[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-19 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15679339#comment-15679339
 ] 

Julian Reschke commented on OAK-3001:
-

trunk: [r1770003|http://svn.apache.org/r1770003] 
[r1769930|http://svn.apache.org/r1769930] 
[r1769922|http://svn.apache.org/r1769922] 
[r1692065|http://svn.apache.org/r1692065]
1.4: [r1692065|http://svn.apache.org/r1692065]
1.2: [r1692066|http://svn.apache.org/r1692066]


> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6, 1.5.14
>
> Attachments: OAK-3001.take1.patch
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-16 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670778#comment-15670778
 ] 

Vikas Saurabh commented on OAK-3001:


Done (OAK-5119).

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6, 1.5.14
>
> Attachments: OAK-3001.take1.patch
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-16 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670200#comment-15670200
 ] 

Marcel Reutegger commented on OAK-3001:
---

That's better, yes.

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6, 1.5.14
>
> Attachments: OAK-3001.take1.patch
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-16 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670192#comment-15670192
 ] 

Vikas Saurabh commented on OAK-3001:


[~mreutegg], oh!, i meant if I need to update this code to utilize (extended) 
condition class. Your comment clarifies what I was seeing.

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6, 1.5.14
>
> Attachments: OAK-3001.take1.patch
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-16 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670187#comment-15670187
 ] 

Vikas Saurabh commented on OAK-3001:


[~mreutegg], I should've clarified my statement better. What I meant was that 
the test was leaving a scope to skip the edge ceck (and I "think" that travis 
somehow had hit that... although I can't formulate any logic for "why" with 
VirtualClock in place).

bq. you added more
Yes, I actually want to just do a hard check for cp-head

bq. I'm also not sure we should add it. It assumes a journal entry is written 
on DocumentNodeStore init.
We can do a fake commit, storeHead, runBkOps before creating a check-point - 
that should create a journal entry and give us head at checkpoint. That should 
be ok, right?

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6, 1.5.14
>
> Attachments: OAK-3001.take1.patch
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-16 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670134#comment-15670134
 ] 

Stefan Egli commented on OAK-3001:
--

bq. can you please review if I wavered off from what we planned to do?
lgtm (reviewed API and usage)

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6, 1.5.14
>
> Attachments: OAK-3001.take1.patch
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-16 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670100#comment-15670100
 ] 

Stefan Egli commented on OAK-3001:
--

bq. noticed OAK-3975 in related issues. I think we should revert that 
configuration and code for house-keeping/maintenance!?
+1

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6, 1.5.14
>
> Attachments: OAK-3001.take1.patch
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-16 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669900#comment-15669900
 ] 

Marcel Reutegger commented on OAK-3001:
---

bq. I didn't extend Condition class for this as afaics usage of condition 
seemed to be somehow linked with id (I'm not completely sure of the intended 
contract). May be that bit needs correction/refactor.

I think we can keep {{Condition}} as is but add clarifications to JavaDoc if 
needed. Conditions are tied to an UpdateOp, which in turn in linked with an ID. 
You can construct an independent Condition, but its usage in the DocumentStore 
interface is always related to an UpdateOp (or id/key).

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6, 1.5.14
>
> Attachments: OAK-3001.take1.patch
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-16 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669885#comment-15669885
 ] 

Marcel Reutegger commented on OAK-3001:
---

AFAICS you didn't change any checks, you added more. The additional checks are 
OK, but go beyond the core issue this test is checking. I'm also not sure we 
should add it. It assumes a journal entry is written on DocumentNodeStore init. 
While that may be the case right now, it is probably not strictly necessary and 
could change in the future.

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6, 1.5.14
>
> Attachments: OAK-3001.take1.patch
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-15 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669642#comment-15669642
 ] 

Vikas Saurabh commented on OAK-3001:


[~mreutegg], travis build failed after my last commit 
[here|https://travis-ci.org/apache/jackrabbit-oak/builds/176271155]. While, I 
think I've fixed the issue, but it seems that 
{{JournalGCTest#gcWithCheckpoint}} isn't doing correct checks. I think we 
should update that test as:
{noformat}
diff --git 
a/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/JournalGCTest.java
 
b/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/JournalGCTest.java
index f518ab2..7c45cab 100644
--- 
a/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/JournalGCTest.java
+++ 
b/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/document/JournalGCTest.java
@@ -44,6 +44,8 @@ public class JournalGCTest {
 DocumentNodeStore ns = builderProvider.newBuilder()
 .clock(c).setAsyncDelay(0).getNodeStore();

+Revision cpHead = ns.getHeadRevision().getRevision(ns.getClusterId());
+assertNotNull(cpHead);
 String cp = ns.checkpoint(TimeUnit.DAYS.toMillis(1));
 // perform some change
 NodeBuilder builder = ns.getRoot().builder();
@@ -55,7 +57,9 @@ public class JournalGCTest {
 // trigger creation of journal entry
 ns.runBackgroundOperations();

-JournalEntry entry = ns.getDocumentStore().find(JOURNAL, 
JournalEntry.asId(head));
+JournalEntry entry = ns.getDocumentStore().find(JOURNAL, 
JournalEntry.asId(cpHead));
+assertNotNull(entry);
+entry = ns.getDocumentStore().find(JOURNAL, JournalEntry.asId(head));
 assertNotNull(entry);

 // wait two hours
@@ -65,6 +69,8 @@ public class JournalGCTest {
 ns.getJournalGarbageCollector().gc(1, TimeUnit.HOURS, 10);

 // must not remove existing entry, because checkpoint is still valid
+entry = ns.getDocumentStore().find(JOURNAL, JournalEntry.asId(cpHead));
+assertNotNull(entry);
 entry = ns.getDocumentStore().find(JOURNAL, JournalEntry.asId(head));
 assertNotNull(entry);

@@ -72,6 +78,8 @@ public class JournalGCTest {

 ns.getJournalGarbageCollector().gc(1, TimeUnit.HOURS, 10);
 // now journal GC can remove the entry
+entry = ns.getDocumentStore().find(JOURNAL, JournalEntry.asId(cpHead));
+assertNull(entry);
 entry = ns.getDocumentStore().find(JOURNAL, JournalEntry.asId(head));
 assertNull(entry);
 }
{noformat}

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6, 1.5.14
>
> Attachments: OAK-3001.take1.patch
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> 

[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-15 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669636#comment-15669636
 ] 

Vikas Saurabh commented on OAK-3001:


Travis build failed 
[here|https://travis-ci.org/apache/jackrabbit-oak/builds/176271155] but the 
test({{JournalGCTest#gcWithCheckpoint}}) seemed to working fine for me locally. 
While investigating, I realized that I had done range check with inclusive 
bounds, while the earlier GC logic relied on query which had exclusive bound 
checks. Made this APIs bound as exlusive too at 
[r1769930|https://svn.apache.org/r1769930].

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6, 1.5.14
>
> Attachments: OAK-3001.take1.patch
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-15 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669384#comment-15669384
 ] 

Vikas Saurabh commented on OAK-3001:


[~egli], just noticed OAK-3975 in related issues. I think we should revert that 
configuration and code for house-keeping/maintenance!?

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6, 1.5.14
>
> Attachments: OAK-3001.take1.patch
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-15 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669321#comment-15669321
 ] 

Vikas Saurabh commented on OAK-3001:


Committed the patch on trunk at [r1769922|https://svn.apache.org/r1769922]. Not 
resolving yet though.

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6, 1.5.14
>
> Attachments: OAK-3001.take1.patch
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-15 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669049#comment-15669049
 ] 

Julian Reschke commented on OAK-3001:
-

I would say: go ahead, as long you have tests, and at least verify with h2 and 
derby (run mvn with -Prdb-derby).

Then, just re-assign to me and I can test with the other DBs and maybe do 
refactoring inside RDBDocumentStore.

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6, 1.5.14
>
> Attachments: OAK-3001.take1.patch
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-15 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667061#comment-15667061
 ] 

Stefan Egli commented on OAK-3001:
--

sounds good to me

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6, 1.5.14
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-15 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667045#comment-15667045
 ] 

Vikas Saurabh commented on OAK-3001:


[~mreutegg], [~egli], {{JournalGarbageCollector#gc}} currently removes journal 
entry by looping for-each-cluster-node -> for-each-matching-_id-in-range. 
Given, we implement {{remove(collection, indexProp /\*_modified for this 
discussion\*/, startVal /\*current logic batches with 0 and loops.. we can 
probably do some flag to skip that condition in remove API\*/, endVal)}}, is it 
ok that we essentially remove the relatively big portion of the method (looping 
over cluster id and _id) and simply remove all entries matching _modified older 
than some age?

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Vikas Saurabh
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6, 1.5.14
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-03 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633215#comment-15633215
 ] 

Marcel Reutegger commented on OAK-3001:
---

No, not necessarily. I think we can also implement this improvement with the 
sub-tasks defined on this issue. That is, modify the existing remove method to 
support a range condition or add a new variant.

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-11-03 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633201#comment-15633201
 ] 

Stefan Egli commented on OAK-3001:
--

[~mreutegg], I've seen you moved OAK-3213 to 1.8 - that would imply we're 
moving OAK-3001 to 1.8 as well, wdyt?

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-02-10 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140957#comment-15140957
 ] 

Stefan Egli commented on OAK-3001:
--

As discussed offline the decision is to not rush this into 1.4 unless necessary 
and instead implement OAK-3975

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-02-03 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130249#comment-15130249
 ] 

Julian Reschke commented on OAK-3001:
-

[~egli] would supporting a range for the indexed property fix *this* problem? 
In which case I'm +1 on doing this minor surgery (yet another DS method)  if we 
will actually do the promised API work quickly after, and get rid of the old 
query signatures then.

(Note that RDBDocumentStore already has the functionality in the meantime; it's 
just not available through a public API; we could special case with 
{{instanceof}} but then would run into issues with code that wraps 
DocumentStore implementations. 

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-02-03 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130264#comment-15130264
 ] 

Chetan Mehrotra commented on OAK-3001:
--

We should aim for simply deleting from the backend without fetching the 
documents. So something like

{code}
 int remove(Collection collection, String 
indexedProperty, long startValue, long endValue);  
{code}

Which easily translates to DB and Mongo call. For this case we only need 
endValue

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-02-03 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130268#comment-15130268
 ] 

Julian Reschke commented on OAK-3001:
-

+1 - let's do the simplest possible thing here to address the issue, but plan 
to get rid of the API as soon as we have something better (and make sure that 
"soon" is really soon and can get backported to most (all?) of the old branches)

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2016-02-03 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130260#comment-15130260
 ] 

Stefan Egli commented on OAK-3001:
--

bq. would supporting a range for the indexed property fix this problem?
re-reading this ticket, I think what we need is either what [~mreutegg] 
suggested initially ("query method with a projection") or a fancier {{remove}} 
method which takes a query. 
The new problem around the block is that single journal entries can have a 
large {{_c}} property. Which means what you want to achieve is to not even 
having to read that {{_c}} (unless you avoid writing large {{_c}} values in the 
first place).
So I don't think a range in a query fixes this..

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Priority: Critical
>  Labels: scalability
> Fix For: 1.6
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2015-09-07 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14733695#comment-14733695
 ] 

Stefan Egli commented on OAK-3001:
--

OAK-3001 is blocked by OAK-3213

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Critical
>  Labels: scalability
> Fix For: 1.3.7, 1.2.6
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2015-07-29 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645579#comment-14645579
 ] 

Julian Reschke commented on OAK-3001:
-

I recommend going back to the API discussion on oak-dev; there are more things 
to consider, such as java based filtering and/or returning sparse results 
(which, for instance, have only certain system properties set)

 Simplify JournalGarbageCollector using a dedicated timestamp property
 -

 Key: OAK-3001
 URL: https://issues.apache.org/jira/browse/OAK-3001
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: core, mongomk
Reporter: Stefan Egli
Assignee: Stefan Egli
Priority: Critical
  Labels: scalability
 Fix For: 1.2.4, 1.3.4


 This subtask is about spawning out a 
 [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
  from [~chetanm] re JournalGC:
 {quote}
 Further looking at JournalGarbageCollector ... it would be simpler if you 
 record the journal entry timestamp as an attribute in JournalEntry document 
 and then you can delete all the entries which are older than some time by a 
 simple query. This would avoid fetching all the entries to be deleted on the 
 Oak side
 {quote}
 and a corresponding 
 [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
  from myself:
 {quote}
 Re querying by timestamp: that would indeed be simpler. With the current set 
 of DocumentStore API however, I believe this is not possible. But: 
 [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
  comes quite close: it would probably just require the opposite of that 
 method too: 
 {code}
 public T extends Document ListT query(CollectionT collection,
   String fromKey,
   String toKey,
   String indexedProperty,
   long endValue,
   int limit) {
 {code}
 .. or what about generalizing this method to have both a {{startValue}} and 
 an {{endValue}} - with {{-1}} indicating when one of them is not used?
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2015-07-27 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642871#comment-14642871
 ] 

Stefan Egli commented on OAK-3001:
--

[~mreutegg], [~reschke], [~chetanm], how should we proceed here. To me the 
simplest change would be to change the current {{query}} method to look as 
follows:
{code}
public T extends Document ListT query(CollectionT collection,
  String fromKey,
  String toKey,
  String indexedProperty,
  long startValue,
  long endValue,
  int limit) {
{code}
where {{startValue==Long.MIN_VALUE}} and/or {{endValue==Long.MAX_VALUE}} could 
be used to basically disable that part of the condition.

 Simplify JournalGarbageCollector using a dedicated timestamp property
 -

 Key: OAK-3001
 URL: https://issues.apache.org/jira/browse/OAK-3001
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: core, mongomk
Reporter: Stefan Egli
Assignee: Stefan Egli
Priority: Critical
  Labels: scalability
 Fix For: 1.2.4, 1.3.4


 This subtask is about spawning out a 
 [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
  from [~chetanm] re JournalGC:
 {quote}
 Further looking at JournalGarbageCollector ... it would be simpler if you 
 record the journal entry timestamp as an attribute in JournalEntry document 
 and then you can delete all the entries which are older than some time by a 
 simple query. This would avoid fetching all the entries to be deleted on the 
 Oak side
 {quote}
 and a corresponding 
 [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
  from myself:
 {quote}
 Re querying by timestamp: that would indeed be simpler. With the current set 
 of DocumentStore API however, I believe this is not possible. But: 
 [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
  comes quite close: it would probably just require the opposite of that 
 method too: 
 {code}
 public T extends Document ListT query(CollectionT collection,
   String fromKey,
   String toKey,
   String indexedProperty,
   long endValue,
   int limit) {
 {code}
 .. or what about generalizing this method to have both a {{startValue}} and 
 an {{endValue}} - with {{-1}} indicating when one of them is not used?
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2015-07-01 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609836#comment-14609836
 ] 

Marcel Reutegger commented on OAK-3001:
---

I converted this issue from a sub-task of OAK-2829 into a separate improvement. 
I think this is an important optimization, but shouldn't block OAK-2829.

 Simplify JournalGarbageCollector using a dedicated timestamp property
 -

 Key: OAK-3001
 URL: https://issues.apache.org/jira/browse/OAK-3001
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: core, mongomk
Reporter: Stefan Egli
Priority: Critical
  Labels: scalability
 Fix For: 1.2.3, 1.3.2


 This subtask is about spawning out a 
 [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
  from [~chetanm] re JournalGC:
 {quote}
 Further looking at JournalGarbageCollector ... it would be simpler if you 
 record the journal entry timestamp as an attribute in JournalEntry document 
 and then you can delete all the entries which are older than some time by a 
 simple query. This would avoid fetching all the entries to be deleted on the 
 Oak side
 {quote}
 and a corresponding 
 [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
  from myself:
 {quote}
 Re querying by timestamp: that would indeed be simpler. With the current set 
 of DocumentStore API however, I believe this is not possible. But: 
 [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
  comes quite close: it would probably just require the opposite of that 
 method too: 
 {code}
 public T extends Document ListT query(CollectionT collection,
   String fromKey,
   String toKey,
   String indexedProperty,
   long endValue,
   int limit) {
 {code}
 .. or what about generalizing this method to have both a {{startValue}} and 
 an {{endValue}} - with {{-1}} indicating when one of them is not used?
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2015-06-22 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595911#comment-14595911
 ] 

Marcel Reutegger commented on OAK-3001:
---

I'm a bit worried about adding more complex methods to the DocumentStore 
interface, but in general it makes sense to avoid loading all the data just to 
remove it quickly after.

A while back [~reschke] and I discussed a query method with a projection. This 
way we would at least not read all the data, but just the {{_id}}s. Back then 
it was in the context of garbage collection where we have a similar 
requirement. 

 Simplify JournalGarbageCollector using a dedicated timestamp property
 -

 Key: OAK-3001
 URL: https://issues.apache.org/jira/browse/OAK-3001
 Project: Jackrabbit Oak
  Issue Type: Sub-task
  Components: core, mongomk
Reporter: Stefan Egli
Priority: Critical
  Labels: scalability
 Fix For: 1.2.3, 1.3.2


 This subtask is about spawning out a 
 [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
  from [~chetanm] re JournalGC:
 {quote}
 Further looking at JournalGarbageCollector ... it would be simpler if you 
 record the journal entry timestamp as an attribute in JournalEntry document 
 and then you can delete all the entries which are older than some time by a 
 simple query. This would avoid fetching all the entries to be deleted on the 
 Oak side
 {quote}
 and a corresponding 
 [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
  from myself:
 {quote}
 Re querying by timestamp: that would indeed be simpler. With the current set 
 of DocumentStore API however, I believe this is not possible. But: 
 [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
  comes quite close: it would probably just require the opposite of that 
 method too: 
 {code}
 public T extends Document ListT query(CollectionT collection,
   String fromKey,
   String toKey,
   String indexedProperty,
   long endValue,
   int limit) {
 {code}
 .. or what about generalizing this method to have both a {{startValue}} and 
 an {{endValue}} - with {{-1}} indicating when one of them is not used?
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2015-06-22 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595891#comment-14595891
 ] 

Stefan Egli commented on OAK-3001:
--

[~chetanm], [~mreutegg], wdyt? 

 Simplify JournalGarbageCollector using a dedicated timestamp property
 -

 Key: OAK-3001
 URL: https://issues.apache.org/jira/browse/OAK-3001
 Project: Jackrabbit Oak
  Issue Type: Sub-task
  Components: core, mongomk
Reporter: Stefan Egli
Priority: Critical
  Labels: scalability
 Fix For: 1.2.3, 1.3.2


 This subtask is about spawning out a 
 [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
  from [~chetanm] re JournalGC:
 {quote}
 Further looking at JournalGarbageCollector ... it would be simpler if you 
 record the journal entry timestamp as an attribute in JournalEntry document 
 and then you can delete all the entries which are older than some time by a 
 simple query. This would avoid fetching all the entries to be deleted on the 
 Oak side
 {quote}
 and a corresponding 
 [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
  from myself:
 {quote}
 Re querying by timestamp: that would indeed be simpler. With the current set 
 of DocumentStore API however, I believe this is not possible. But: 
 [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
  comes quite close: it would probably just require the opposite of that 
 method too: 
 {code}
 public T extends Document ListT query(CollectionT collection,
   String fromKey,
   String toKey,
   String indexedProperty,
   long endValue,
   int limit) {
 {code}
 .. or what about generalizing this method to have both a {{startValue}} and 
 an {{endValue}} - with {{-1}} indicating when one of them is not used?
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)