[ 
https://issues.apache.org/jira/browse/OAK-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-4756:
-------------------------------
    Fix Version/s: 1.7.3

> A parallel approach to garbage collection
> -----------------------------------------
>
>                 Key: OAK-4756
>                 URL: https://issues.apache.org/jira/browse/OAK-4756
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segment-tar
>            Reporter: Francesco Mari
>            Assignee: Francesco Mari
>              Labels: gc, scalability
>             Fix For: 1.8, 1.7.3
>
>
> Assuming that:
> # Logic record IDs are implemented.
> # TAR files are ordered in reverse chronological order.
> # When reading segments, TAR files are consulted in order.
> # Segments in recent TAR files shadow segments in older TAR files with the 
> same segment ID.
> A new algorithm for garbage collection can be implemented:
> # Define the input for the garbage collection process. The input consists of 
> the current set of TAR files and a set of record IDs representing the GC 
> roots.
> # Traverse the GC roots and mark the records that are still in use. The mark 
> phase traverses the record graph and produces a list of record IDs. These 
> record IDs are referenced directly or indirectly by the given set of GC roots 
> and need to be kept. The list of record IDs is ordered by segment ID first 
> and record number next. This way, it is possible to process this list in one 
> pass and figure out which segment and which record should be saved at the end 
> of the garbage collection.
> # Remove unused records from segments and rewrite them in a new set of TAR 
> files. The list is produced in the previous step is traversed. For each 
> segment encountered, a new segment is created containing only the records 
> that were marked in the previous phase. This segment is then saved in a new 
> set of TAR files. The set of new TAR files is the result of the garbage 
> collection process. 
> # Add the new TAR files to the system. The system will append the new TAR 
> files to the segment store. The segments in these TAR files will shadow the 
> ones in older TAR files.
> # Remove TAR files from the old generation. It is safe to do so because the 
> new set of TAR files are currently shadowing the initial set of TAR files.
> While the garbage collection process is running, the system can work as usual 
> by starting a fresh TAR file. The result of the garbage collection is made 
> visible atomically only at the end, when the new TAR files are integrated 
> into the running system.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to