[jira] [Comment Edited] (OAK-3349) Partial compaction

JIRA Thu, 29 Jun 2017 05:51:19 -0700

    [ 
https://issues.apache.org/jira/browse/OAK-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068288#comment-16068288
 ]


Michael Dürig edited comment on OAK-3349 at 6/29/17 12:50 PM:
--------------------------------------------------------------

h6. Implementation note on tail compaction 

In contrast to the existing compaction approach (full compaction) tail 
compaction rebases all changes since the last compaction on top of the result 
of that last compaction. Cleanup subsequently cleans up the uncompacted 
changes. Each tail compaction cycle creates a new generation incrementing the 
generation number. Cleanup remove all non compacted segments whose generation 
is no bigger than the current generation minus a certain amount of retained 
generations (2 by default). 

To make this work we need to be able to determine the age of a segment (in 
number of generations) and whether a segment has been written by the compactor 
or by a regular writer (and is thus uncompacted). The 
[POC|https://github.com/mduerig/jackrabbit-oak/commits/OAK-3349-POC] 
implemented this by assigning even generation numbers to regular segments and 
odd ones to segment written by tail compaction while at the same time 
completely removing support for full compaction.

To combine tail compaction with full compaction I suggest to introduce a young 
generation field in the segment header, which is used by tail compaction as 
described. The existing generation field will thus keep being used for full 
compaction without changing its semantics. 

The proposed approach has the advantage of tail and full compaction being 
completely orthogonal. You can run either of which or both without one 
affecting or influencing the other. 
Both compaction and cleanup methods solely rely on the information in the 
segment headers. A predicate for determining which segments to retain can be 
inferred from the segment containing the head revision. There is no need to 
rely on auxiliary information with the small exception of tail compaction using 
the {{gc.log}} file to determine the base revision to compact onto. This is not 
problematic though wrt. to resilience as we can always fall back to full 
compaction should the base revision be invalid. (A base revision can be invalid 
in two ways: either is is not found or it is one not written by the compactor. 
Both cases can only occur after manual tampering with the {{journal.log}}.)
Finally the approach plays well with upgrading: while the additional young 
generation field requires us to bump the segment version we can easily maintain 
backwards compatibility and do a rolling upgrade segment by segment. Segments 
of the prevision version will just not be eligible for cleanup under tail 
compaction. 



was (Author: mduerig):
h6. Implementing note on tail compaction 

In contrast to the existing compaction approach (full compaction) tail 
compaction rebases all changes since the last compaction on top of the result 
of that last compaction. Cleanup subsequently cleans up the uncompacted 
changes. Each tail compaction cycle creates a new generation incrementing the 
generation number. Cleanup remove all non compacted segments whose generation 
is no bigger than the current generation minus a certain amount of retained 
generations (2 by default). 

To make this work we need to be able to determine the age of a segment (in 
number of generations) and whether a segment has been written by the compactor 
or by a regular writer (and is thus uncompacted). The 
[POC|https://github.com/mduerig/jackrabbit-oak/commits/OAK-3349-POC] 
implemented this by assigning even generation numbers to regular segments and 
odd ones to segment written by tail compaction while at the same time 
completely removing support for full compaction.

To combine tail compaction with full compaction I suggest to introduce a young 
generation field in the segment header, which is used by tail compaction as 
described. The existing generation field will thus keep being used for full 
compaction without changing its semantics. 

The proposed approach has the advantage of tail and full compaction being 
completely orthogonal. You can run either of which or both without one 
affecting or influencing the other. 
Both compaction and cleanup methods solely rely on the information in the 
segment headers. A predicate for determining which segments to retain can be 
inferred from the segment containing the head revision. There is no need to 
rely on auxiliary information with the small exception of tail compaction using 
the {{gc.log}} file to determine the base revision to compact onto. This is not 
problematic though wrt. to resilience as we can always fall back to full 
compaction should the base revision be invalid. (A base revision can be invalid 
in two ways: either is is not found or it is one not written by the compactor. 
Both cases can only occur after manual tampering with the {{journal.log}}.)
Finally the approach plays well with upgrading: while the additional young 
generation field requires us to bump the segment version we can easily maintain 
backwards compatibility and do a rolling upgrade segment by segment. Segments 
of the prevision version will just not be eligible for cleanup under tail 
compaction. 


> Partial compaction
> ------------------
>
>                 Key: OAK-3349
>                 URL: https://issues.apache.org/jira/browse/OAK-3349
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>              Labels: compaction, gc, scalability
>             Fix For: 1.8, 1.7.4
>
>         Attachments: compaction-time.png, cycle-count.png, post-gc-size.png
>
>
> On big repositories compaction can take quite a while to run as it needs to 
> create a full deep copy of the current root node state. For such cases it 
> could be beneficial if we could partially compact the repository thus 
> splitting full compaction over multiple cycles. 
> Partial compaction would run compaction on a sub-tree just like we now run it 
> on the full tree. Afterwards it would create a new root node state by 
> referencing the previous root node state replacing said sub-tree with the 
> compacted one. 
> Todo: Asses feasibility and impact, implement prototype.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (OAK-3349) Partial compaction

Reply via email to