[
https://issues.apache.org/jira/browse/OAK-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383780#comment-15383780
]
Michael Dürig commented on OAK-4014:
------------------------------------
[~frm], is this still a problem with the new generation base cleanup introduced
with OAK-3348?
> The segment store should merge small TAR files into bigger ones
> ---------------------------------------------------------------
>
> Key: OAK-4014
> URL: https://issues.apache.org/jira/browse/OAK-4014
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: segment-tar
> Reporter: Francesco Mari
> Assignee: Francesco Mari
> Fix For: 1.6, Segment Tar 0.0.6
>
>
> The cleanup process removes unused segments from TAR files and writes new
> generations of those TAR files without the removed segments.
> In the long run, the size of some TAR file might be smaller than the maximum
> size allowed for a TAR file. At the time this issue was created the default
> maximum size of a TAR file is 256 MiB.
> If there are many small TAR files, it should be possible to merge them in
> bigger files. This way, we can reduce the total number of TAR files in the
> segment store, and thus the number of open file descriptors that Oak has to
> maintain.
> A possible implementation for the merge operation is the following:
> # Sort the list of TAR files by size, ascending.
> # Pick TAR files for the sorted list until the sum of their sizes after the
> merge is less than 256 MiB.
> # Merge the picked up files into a new TAR file and marked the picked up
> files for deletion.
> # Continue picking up TAR files from the sorted list until the list is
> exhausted or until it's only possible to pick a single TAR file.
> The merge process can run in a background thread but it is important that it
> doesn't conflict with the cleanup operation, since merge and cleanup both
> change the representation of TAR files on the file system. Two possible
> solutions to avoid conflicts are:
> # Use a global lock for the whole set of TAR files.
> # Use a lock per TAR file. The cleanup and merge processes have to agree on
> the order to use when acquiring the lock.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)