[
https://issues.apache.org/jira/browse/OAK-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035849#comment-15035849
]
Marcel Reutegger commented on OAK-3710:
---------------------------------------
Had an offline discussion with Chetan and Vikas about how to implement this
feature. The basic ideas are:
- Remember T' as lowest revision time of _lastRev entries on the root document.
- Scan through documents that have a _modified >= T read from settings
collection. Use a value of 0 if T is undefined.
- For each document:
-- remove changes (committed and uncommitted) that are older than
{{maxRevisionAge}} (see also OAK-3712)
-- rewrite commit entries of remaining committed changes and set local
_revisions entries accordingly (may collide with split operations!)
- Store T' in settings collection for starting point of next cycle
- Remove split documents with {{_sdMaxRevTime}} < T (see also OAK-3711)
In addition it would also be good to change the way documents are split.
Currently _commitRoot entries are moved to previous documents. I think it would
be better to rewrite the change on split and replace _commitRoot with
_revisions entries with the correct commit value. This reduces dependency on
the commit root document.
> Continuous revision GC
> ----------------------
>
> Key: OAK-3710
> URL: https://issues.apache.org/jira/browse/OAK-3710
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: core, documentmk
> Reporter: Marcel Reutegger
>
> Implement continuous revision GC cleaning up documents older than a given
> threshold (e.g. one day). This issue is related to OAK-3070 where each GC run
> starts where the last one finished.
> This will avoid peak load on the system as we see it right now, when GC is
> triggered once a day.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)