Stefan Egli created OAK-10688:
---------------------------------
Summary: Keep only traversed state, remove all other revisions
Key: OAK-10688
URL: https://issues.apache.org/jira/browse/OAK-10688
Project: Jackrabbit Oak
Issue Type: Task
Components: documentmk
Reporter: Stefan Egli
Assignee: Stefan Egli
As a slightly different algorithm to OAK-10535 this ticket suggests to
calculate the traversedState of a node, then keeps only those revisions needed
for that traversedState and removes all others. The main difference is an
inversion of logic, where instead of analysing for each revision whether it
must be kept or not - this first derives the revision that must be "kept" from
the traversedState - then deletes all others.
This mechanism applies to all (normal and bundled) properties as well as some
DocumentNodeStore internal ones, such as "_deleted".
Below are a list of assumptions to back this:
* DetailedGC runs only up to the older between the oldest checkpoint and
maxRevisionAge (24h by default). Thus a document analysed by DetailedGC is
guaranteed to have only 1 revision (per property) that must be kept - as it is
guaranteed to not have modifications (revisions) younger than any checkpoint or
maxRevisionAge (24h)
* To find out which revision(s) must be kept, the node tree is traversed from
root (based on current head revision) to the target document.
* Given the first bullet (that we're only looking at nodes that have only 1
revision (each, per property) to keep, this traversed node state thus contains
the values of those.
* Hence, based on each of the property key of the traversed state, the
corresponding "commit revision" in the document-local map must be calculated.
That local map entry must be kept - all others can be deleted.
* Note that this also cleans up overwritten branch commits of the same branch
(as only the last, relevant one is kept)
As a result of the above, certain other entries can be deleted, namely:
* any "_commitRoot" entry no longer referenced by the local document
* any "_bc" entry no longer referenced by the local document
Independent of the traversedState and the outcome of the cleanup what can also
be removed is:
* any "_revisions" entry older than the current sweepRev
However: "_revisions" entry that might not be referenced by the local document
and are younger than the sweepRev must still be kept, as they might be
referenced by child documents (through their "_commitRoot" pointing to the
current document). Without checking for children and double-checking the actual
use, there could as a result still be some garbage "_revisions" entries left.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)