[
https://issues.apache.org/jira/browse/OAK-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058292#comment-15058292
]
Michael Dürig edited comment on OAK-3348 at 1/8/16 11:22 PM:
-------------------------------------------------------------
AFAICS there two things we need to fix to get rid of references to
pre-compacted segments:
# Prevent segment node builders acquired before a compaction cycle to flush
into compacted segments. As such builders reference a base state that lives in
pre-compacted segments, such a flush pollutes compacted segments.
# When merging segment node builders that have been acquired before a
compaction cycle prevent them to link back to its pre-compacted base state.
-For 1) we can ensure that such node builders would write into their own
segment. This should be relatively easy by leveraging the borrow mechanism for
{{SegmentBufferWriters}} introdues in OAK-1828.- *Edit*: this is actually not
necessary as OAK-2192 should have taken care of this already.
For 2) we need to "compact" the base state of such builders after the fact. To
make this efficient (wrt. de-duplication) we need to pass the compaction map of
the respective generation to the compactor. (There is a slight chance that a
builder is older than just a single gc generation, in which case this approach
is not correct. For the time being I consider this an edge case and would just
throw an {{CommitFailedException}} at this point).
For 2), a more efficient and probably also simpler approach might be to
structure the changes written by a builder in a way to entierly avoid back
links to pre compacted segments once merged.
The idea is to separate changes
[written|https://github.com/mduerig/jackrabbit-oak/blob/2186df37e1bc73a871e5020261d39b27f1eff925/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/SegmentNodeBuilder.java#L111-L111]
by segment node builders such that there wouldn't be any back links when later
merged. AFICS this could be done by writting added nodes and properties to a
seperate set of segments from the one to which all other changes are written.
Those added items don't have back references but will be the only ones being
references once merged: merge is implemented through rebasing changes, which
rewrites all changed items on top of the latest head and creates references to
added items.
was (Author: mduerig):
AFAICS there two things we need to fix to get rid of references to
pre-compacted segments:
# Prevent segment node builders acquired before a compaction cycle to flush
into compacted segments. As such builders reference a base state that lives in
pre-compacted segments, such a flush pollutes compacted segments.
# When merging segment node builders that have been acquired before a
compaction cycle prevent them to link back to its pre-compacted base state.
For 1) we can ensure that such node builders would write into their own
segment. This should be relatively easy by leveraging the borrow mechanism for
{{SegmentBufferWriters}} introdues in OAK-1828.
For 2) we need to "compact" the base state of such builders after the fact. To
make this efficient (wrt. de-duplication) we need to pass the compaction map of
the respective generation to the compactor. (There is a slight chance that a
builder is older than just a single gc generation, in which case this approach
is not correct. For the time being I consider this an edge case and would just
throw an {{CommitFailedException}} at this point).
For 2), a more efficient and probably also simpler approach might be to
structure the changes written by a builder in a way to entierly avoid back
links to pre compacted segments once merged.
The idea is to separate changes
[written|https://github.com/mduerig/jackrabbit-oak/blob/2186df37e1bc73a871e5020261d39b27f1eff925/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/SegmentNodeBuilder.java#L111-L111]
by segment node builders such that there wouldn't be any back links when later
merged. AFICS this could be done by writting added nodes and properties to a
seperate set of segments from the one to which all other changes are written.
Those added items don't have back references but will be the only ones being
references once merged: merge is implemented through rebasing changes, which
rewrites all changed items on top of the latest head and creates references to
added items.
> Cross gc sessions might introduce references to pre-compacted segments
> ----------------------------------------------------------------------
>
> Key: OAK-3348
> URL: https://issues.apache.org/jira/browse/OAK-3348
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: segmentmk
> Reporter: Michael Dürig
> Assignee: Michael Dürig
> Labels: candidate_oak_1_0, candidate_oak_1_2, cleanup,
> compaction, gc
> Fix For: 1.4
>
> Attachments: OAK-3348-1.patch, OAK-3348-2.patch, OAK-3348.patch,
> cross-gc-refs.pdf, image.png
>
>
> I suspect that certain write operations during compaction can cause
> references from compacted segments to pre-compacted ones. This would
> effectively prevent the pre-compacted segments from getting evicted in
> subsequent cleanup phases.
> The scenario is as follows:
> * A session is opened and a lot of content is written to it such that the
> update limit is exceeded. This causes the changes to be written to disk.
> * Revision gc runs causing a new, compacted root node state to be written to
> disk.
> * The session saves its changes. This causes rebasing of its changes onto the
> current root (the compacted one). At this point any node that has been added
> will be added again in the sub-tree rooted at the current root. Such nodes
> however might have been written to disk *before* revision gc ran and might
> thus be contained in pre-compacted segments. As I suspect the node-add
> operation in the rebasing process *not* to create a deep copy of such nodes
> but to rather create a *reference* to them, a reference to a pre-compacted
> segment is introduced here.
> Going forward we need to validate above hypothesis, assess its impact if
> necessary come up with a solution.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)