[
https://issues.apache.org/jira/browse/OAK-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Dürig updated OAK-4756:
-------------------------------
Fix Version/s: 1.7.3
> A parallel approach to garbage collection
> -----------------------------------------
>
> Key: OAK-4756
> URL: https://issues.apache.org/jira/browse/OAK-4756
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: segment-tar
> Reporter: Francesco Mari
> Assignee: Francesco Mari
> Labels: gc, scalability
> Fix For: 1.8, 1.7.3
>
>
> Assuming that:
> # Logic record IDs are implemented.
> # TAR files are ordered in reverse chronological order.
> # When reading segments, TAR files are consulted in order.
> # Segments in recent TAR files shadow segments in older TAR files with the
> same segment ID.
> A new algorithm for garbage collection can be implemented:
> # Define the input for the garbage collection process. The input consists of
> the current set of TAR files and a set of record IDs representing the GC
> roots.
> # Traverse the GC roots and mark the records that are still in use. The mark
> phase traverses the record graph and produces a list of record IDs. These
> record IDs are referenced directly or indirectly by the given set of GC roots
> and need to be kept. The list of record IDs is ordered by segment ID first
> and record number next. This way, it is possible to process this list in one
> pass and figure out which segment and which record should be saved at the end
> of the garbage collection.
> # Remove unused records from segments and rewrite them in a new set of TAR
> files. The list is produced in the previous step is traversed. For each
> segment encountered, a new segment is created containing only the records
> that were marked in the previous phase. This segment is then saved in a new
> set of TAR files. The set of new TAR files is the result of the garbage
> collection process.
> # Add the new TAR files to the system. The system will append the new TAR
> files to the segment store. The segments in these TAR files will shadow the
> ones in older TAR files.
> # Remove TAR files from the old generation. It is safe to do so because the
> new set of TAR files are currently shadowing the initial set of TAR files.
> While the garbage collection process is running, the system can work as usual
> by starting a fresh TAR file. The result of the garbage collection is made
> visible atomically only at the end, when the new TAR files are integrated
> into the running system.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)