[
https://issues.apache.org/jira/browse/OAK-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068288#comment-16068288
]
Michael Dürig edited comment on OAK-3349 at 6/29/17 12:50 PM:
--------------------------------------------------------------
h6. Implementation note on tail compaction
In contrast to the existing compaction approach (full compaction) tail
compaction rebases all changes since the last compaction on top of the result
of that last compaction. Cleanup subsequently cleans up the uncompacted
changes. Each tail compaction cycle creates a new generation incrementing the
generation number. Cleanup remove all non compacted segments whose generation
is no bigger than the current generation minus a certain amount of retained
generations (2 by default).
To make this work we need to be able to determine the age of a segment (in
number of generations) and whether a segment has been written by the compactor
or by a regular writer (and is thus uncompacted). The
[POC|https://github.com/mduerig/jackrabbit-oak/commits/OAK-3349-POC]
implemented this by assigning even generation numbers to regular segments and
odd ones to segment written by tail compaction while at the same time
completely removing support for full compaction.
To combine tail compaction with full compaction I suggest to introduce a young
generation field in the segment header, which is used by tail compaction as
described. The existing generation field will thus keep being used for full
compaction without changing its semantics.
The proposed approach has the advantage of tail and full compaction being
completely orthogonal. You can run either of which or both without one
affecting or influencing the other.
Both compaction and cleanup methods solely rely on the information in the
segment headers. A predicate for determining which segments to retain can be
inferred from the segment containing the head revision. There is no need to
rely on auxiliary information with the small exception of tail compaction using
the {{gc.log}} file to determine the base revision to compact onto. This is not
problematic though wrt. to resilience as we can always fall back to full
compaction should the base revision be invalid. (A base revision can be invalid
in two ways: either is is not found or it is one not written by the compactor.
Both cases can only occur after manual tampering with the {{journal.log}}.)
Finally the approach plays well with upgrading: while the additional young
generation field requires us to bump the segment version we can easily maintain
backwards compatibility and do a rolling upgrade segment by segment. Segments
of the prevision version will just not be eligible for cleanup under tail
compaction.
was (Author: mduerig):
h6. Implementing note on tail compaction
In contrast to the existing compaction approach (full compaction) tail
compaction rebases all changes since the last compaction on top of the result
of that last compaction. Cleanup subsequently cleans up the uncompacted
changes. Each tail compaction cycle creates a new generation incrementing the
generation number. Cleanup remove all non compacted segments whose generation
is no bigger than the current generation minus a certain amount of retained
generations (2 by default).
To make this work we need to be able to determine the age of a segment (in
number of generations) and whether a segment has been written by the compactor
or by a regular writer (and is thus uncompacted). The
[POC|https://github.com/mduerig/jackrabbit-oak/commits/OAK-3349-POC]
implemented this by assigning even generation numbers to regular segments and
odd ones to segment written by tail compaction while at the same time
completely removing support for full compaction.
To combine tail compaction with full compaction I suggest to introduce a young
generation field in the segment header, which is used by tail compaction as
described. The existing generation field will thus keep being used for full
compaction without changing its semantics.
The proposed approach has the advantage of tail and full compaction being
completely orthogonal. You can run either of which or both without one
affecting or influencing the other.
Both compaction and cleanup methods solely rely on the information in the
segment headers. A predicate for determining which segments to retain can be
inferred from the segment containing the head revision. There is no need to
rely on auxiliary information with the small exception of tail compaction using
the {{gc.log}} file to determine the base revision to compact onto. This is not
problematic though wrt. to resilience as we can always fall back to full
compaction should the base revision be invalid. (A base revision can be invalid
in two ways: either is is not found or it is one not written by the compactor.
Both cases can only occur after manual tampering with the {{journal.log}}.)
Finally the approach plays well with upgrading: while the additional young
generation field requires us to bump the segment version we can easily maintain
backwards compatibility and do a rolling upgrade segment by segment. Segments
of the prevision version will just not be eligible for cleanup under tail
compaction.
> Partial compaction
> ------------------
>
> Key: OAK-3349
> URL: https://issues.apache.org/jira/browse/OAK-3349
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: segment-tar
> Reporter: Michael Dürig
> Assignee: Michael Dürig
> Labels: compaction, gc, scalability
> Fix For: 1.8, 1.7.4
>
> Attachments: compaction-time.png, cycle-count.png, post-gc-size.png
>
>
> On big repositories compaction can take quite a while to run as it needs to
> create a full deep copy of the current root node state. For such cases it
> could be beneficial if we could partially compact the repository thus
> splitting full compaction over multiple cycles.
> Partial compaction would run compaction on a sub-tree just like we now run it
> on the full tree. Afterwards it would create a new root node state by
> referencing the previous root node state replacing said sub-tree with the
> compacted one.
> Todo: Asses feasibility and impact, implement prototype.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)