[
https://issues.apache.org/jira/browse/OAK-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204106#comment-15204106
]
Michael Dürig commented on OAK-3348:
------------------------------------
I just
[pushed|https://github.com/mduerig/jackrabbit-oak/commit/d0fcbd7369f432c0afc3fef914a3feff0b6a17e8]
a new cleanup implementation that leverages the GC generations: instead of
relying on reachability this strategy assigns a live time to records based on
its generation. The life time is currently hard coded to two generations, so
anything further back than the last generation will be removed. Ultimately this
threshold should be configurable and I left a comment in the code to this
respect.
Note that bulk segments require some special treatment in the generation based
cleanup strategy as those don't have a generation assigned to them. (They
actually can't as we share bulk records across generations). For bulk segments
the approach is to still use reachability.
> Cross gc sessions might introduce references to pre-compacted segments
> ----------------------------------------------------------------------
>
> Key: OAK-3348
> URL: https://issues.apache.org/jira/browse/OAK-3348
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: segmentmk
> Reporter: Michael Dürig
> Assignee: Michael Dürig
> Labels: candidate_oak_1_0, candidate_oak_1_2, cleanup,
> compaction, gc
> Fix For: 1.6
>
> Attachments: OAK-3348-1.patch, OAK-3348-2.patch, OAK-3348.patch,
> SCIT.patch, cleanup-time.png, compaction-time.png, cross-gc-refs.pdf,
> image.png, repo-size.png
>
>
> I suspect that certain write operations during compaction can cause
> references from compacted segments to pre-compacted ones. This would
> effectively prevent the pre-compacted segments from getting evicted in
> subsequent cleanup phases.
> The scenario is as follows:
> * A session is opened and a lot of content is written to it such that the
> update limit is exceeded. This causes the changes to be written to disk.
> * Revision gc runs causing a new, compacted root node state to be written to
> disk.
> * The session saves its changes. This causes rebasing of its changes onto the
> current root (the compacted one). At this point any node that has been added
> will be added again in the sub-tree rooted at the current root. Such nodes
> however might have been written to disk *before* revision gc ran and might
> thus be contained in pre-compacted segments. As I suspect the node-add
> operation in the rebasing process *not* to create a deep copy of such nodes
> but to rather create a *reference* to them, a reference to a pre-compacted
> segment is introduced here.
> Going forward we need to validate above hypothesis, assess its impact if
> necessary come up with a solution.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)