[
https://issues.apache.org/jira/browse/OAK-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829210#comment-17829210
]
Stefan Egli commented on OAK-10688:
-----------------------------------
* previous branch was discarded
* new branch created instead :
https://github.com/apache/jackrabbit-oak/tree/OAK-10688-rebase
* copied change from original branch with further changes into the new branch
(in [this
commit|https://github.com/apache/jackrabbit-oak/commit/40225c85a6c0784ea120ffe1b4aaa486c50fecc8])
* created a PR with that
[here|https://github.com/apache/jackrabbit-oak/pull/1372]
> Keep only traversed state, remove all other revisions
> -----------------------------------------------------
>
> Key: OAK-10688
> URL: https://issues.apache.org/jira/browse/OAK-10688
> Project: Jackrabbit Oak
> Issue Type: Task
> Components: documentmk
> Reporter: Stefan Egli
> Assignee: Stefan Egli
> Priority: Major
>
> As a slightly different algorithm to OAK-10535 this ticket suggests to
> calculate the traversedState of a node, then keeps only those revisions
> needed for that traversedState and removes all others. The main difference is
> an inversion of logic, where instead of analysing for each revision whether
> it must be kept or not - this first derives the revision that must be "kept"
> from the traversedState - then deletes all others.
> This mechanism applies to all (normal and bundled) properties as well as some
> DocumentNodeStore internal ones, such as "_deleted".
> Below are a list of assumptions to back this:
> * DetailedGC runs only up to the older between the oldest checkpoint and
> maxRevisionAge (24h by default). Thus a document analysed by DetailedGC is
> guaranteed to have only 1 revision (per property) that must be kept - as it
> is guaranteed to not have modifications (revisions) younger than any
> checkpoint or maxRevisionAge (24h)
> * To find out which revision(s) must be kept, the node tree is traversed from
> root (based on current head revision) to the target document.
> * Given the first bullet (that we're only looking at nodes that have only 1
> revision (each, per property) to keep, this traversed node state thus
> contains the values of those.
> * Hence, based on each of the property key of the traversed state, the
> corresponding "commit revision" in the document-local map must be calculated.
> That local map entry must be kept - all others can be deleted.
> * Note that this also cleans up overwritten branch commits of the same branch
> (as only the last, relevant one is kept)
> As a result of the above, certain other entries can be deleted, namely:
> * any "_commitRoot" entry no longer referenced by the local document
> * any "_bc" entry no longer referenced by the local document
> Independent of the traversedState and the outcome of the cleanup what can
> also be removed is:
> * any "_revisions" entry older than the current sweepRev
> However: "_revisions" entry that might not be referenced by the local
> document and are younger than the sweepRev must still be kept, as they might
> be referenced by child documents (through their "_commitRoot" pointing to the
> current document). Without checking for children and double-checking the
> actual use, there could as a result still be some garbage "_revisions"
> entries left.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)