[ 
https://issues.apache.org/jira/browse/OAK-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035849#comment-15035849
 ] 

Marcel Reutegger commented on OAK-3710:
---------------------------------------

Had an offline discussion with Chetan and Vikas about how to implement this 
feature. The basic ideas are:

- Remember T' as lowest revision time of _lastRev entries on the root document.
- Scan through documents that have a _modified >= T read from settings 
collection. Use a value of 0 if T is undefined.
- For each document:
-- remove changes (committed and uncommitted) that are older than 
{{maxRevisionAge}} (see also OAK-3712)
-- rewrite commit entries of remaining committed changes and set local 
_revisions entries accordingly (may collide with split operations!)
- Store T' in settings collection for starting point of next cycle
- Remove split documents with {{_sdMaxRevTime}} < T (see also OAK-3711)

In addition it would also be good to change the way documents are split. 
Currently _commitRoot entries are moved to previous documents. I think it would 
be better to rewrite the change on split and replace _commitRoot with 
_revisions entries with the correct commit value. This reduces dependency on 
the commit root document.

> Continuous revision GC
> ----------------------
>
>                 Key: OAK-3710
>                 URL: https://issues.apache.org/jira/browse/OAK-3710
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: core, documentmk
>            Reporter: Marcel Reutegger
>
> Implement continuous revision GC cleaning up documents older than a given 
> threshold (e.g. one day). This issue is related to OAK-3070 where each GC run 
> starts where the last one finished.
> This will avoid peak load on the system as we see it right now, when GC is 
> triggered once a day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to