[ 
https://issues.apache.org/jira/browse/OAK-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829210#comment-17829210
 ] 

Stefan Egli commented on OAK-10688:
-----------------------------------

* previous branch was discarded
* new branch created instead : 
https://github.com/apache/jackrabbit-oak/tree/OAK-10688-rebase
* copied change from original branch with further changes into the new branch 
(in [this 
commit|https://github.com/apache/jackrabbit-oak/commit/40225c85a6c0784ea120ffe1b4aaa486c50fecc8])
* created a PR with that 
[here|https://github.com/apache/jackrabbit-oak/pull/1372]

> Keep only traversed state, remove all other revisions
> -----------------------------------------------------
>
>                 Key: OAK-10688
>                 URL: https://issues.apache.org/jira/browse/OAK-10688
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: documentmk
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>            Priority: Major
>
> As a slightly different algorithm to OAK-10535 this ticket suggests to 
> calculate the traversedState of a node, then keeps only those revisions 
> needed for that traversedState and removes all others. The main difference is 
> an inversion of logic, where instead of analysing for each revision whether 
> it must be kept or not - this first derives the revision that must be "kept" 
> from the traversedState - then deletes all others.
> This mechanism applies to all (normal and bundled) properties as well as some 
> DocumentNodeStore internal ones, such as "_deleted".
> Below are a list of assumptions to back this:
> * DetailedGC runs only up to the older between the oldest checkpoint and 
> maxRevisionAge (24h by default). Thus a document analysed by DetailedGC is 
> guaranteed to have only 1 revision (per property) that must be kept - as it 
> is guaranteed to not have modifications (revisions) younger than any 
> checkpoint or maxRevisionAge (24h)
> * To find out which revision(s) must be kept, the node tree is traversed from 
> root (based on current head revision) to the target document.
> * Given the first bullet (that we're only looking at nodes that have only 1 
> revision (each, per property) to keep, this traversed node state thus 
> contains the values of those.
> * Hence, based on each of the property key of the traversed state, the 
> corresponding "commit revision" in the document-local map must be calculated. 
> That local map entry must be kept - all others can be deleted.
> * Note that this also cleans up overwritten branch commits of the same branch 
> (as only the last, relevant one is kept)
> As a result of the above, certain other entries can be deleted, namely:
> * any "_commitRoot" entry no longer referenced by the local document
> * any "_bc" entry no longer referenced by the local document
> Independent of the traversedState and the outcome of the cleanup what can 
> also be removed is:
> * any "_revisions" entry older than the current sweepRev
> However: "_revisions" entry that might not be referenced by the local 
> document and are younger than the sweepRev must still be kept, as they might 
> be referenced by child documents (through their "_commitRoot" pointing to the 
> current document). Without checking for children and double-checking the 
> actual use, there could as a result still be some garbage "_revisions" 
> entries left.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to