[
https://issues.apache.org/jira/browse/OAK-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090206#comment-15090206
]
Michael Dürig commented on OAK-3348:
------------------------------------
At https://github.com/mduerig/jackrabbit-oak/commits/OAK-3348 I started
implementing a POC for above approach for 2):
* Prevent back references by flushing segment node builders into 2 sets of
segments: free and merged. A segment is free if it has been created by a
builder and only references free segments. Otherwise a segment is merged.
* When rebasing a builder during merge:
** Link to records in free segments and mark those segments as merged.
** Clone all records in cross gc merged segments before linking to them.
(Optimally there would be no such records (i.e. optimally all references would
point into free segments). Note: if this builder contains references to records
in segments of other builders, those segments would also become merged along
with all segments referencing them.
I structured the commits such that it should be relatively easy to follow. See
FIXME tags for what is still missing and what needs cleaning up.
cc [~frm], [~alex.parvulescu]
> Cross gc sessions might introduce references to pre-compacted segments
> ----------------------------------------------------------------------
>
> Key: OAK-3348
> URL: https://issues.apache.org/jira/browse/OAK-3348
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: segmentmk
> Reporter: Michael Dürig
> Assignee: Michael Dürig
> Labels: candidate_oak_1_0, candidate_oak_1_2, cleanup,
> compaction, gc
> Fix For: 1.4
>
> Attachments: OAK-3348-1.patch, OAK-3348-2.patch, OAK-3348.patch,
> cross-gc-refs.pdf, image.png
>
>
> I suspect that certain write operations during compaction can cause
> references from compacted segments to pre-compacted ones. This would
> effectively prevent the pre-compacted segments from getting evicted in
> subsequent cleanup phases.
> The scenario is as follows:
> * A session is opened and a lot of content is written to it such that the
> update limit is exceeded. This causes the changes to be written to disk.
> * Revision gc runs causing a new, compacted root node state to be written to
> disk.
> * The session saves its changes. This causes rebasing of its changes onto the
> current root (the compacted one). At this point any node that has been added
> will be added again in the sub-tree rooted at the current root. Such nodes
> however might have been written to disk *before* revision gc ran and might
> thus be contained in pre-compacted segments. As I suspect the node-add
> operation in the rebasing process *not* to create a deep copy of such nodes
> but to rather create a *reference* to them, a reference to a pre-compacted
> segment is introduced here.
> Going forward we need to validate above hypothesis, assess its impact if
> necessary come up with a solution.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)