[ 
https://issues.apache.org/jira/browse/OAK-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-4014:
-------------------------------
    Attachment: tar_sizes.png

Histogram of the tar file sizes: !tar_sizes.png!

> The segment store should merge small TAR files into bigger ones
> ---------------------------------------------------------------
>
>                 Key: OAK-4014
>                 URL: https://issues.apache.org/jira/browse/OAK-4014
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>            Reporter: Francesco Mari
>            Assignee: Francesco Mari
>             Fix For: Segment Tar 0.0.16
>
>         Attachments: tar_sizes.m, tar_sizes.png
>
>
> The cleanup process removes unused segments from TAR files and writes new 
> generations of those TAR files without the removed segments.
> In the long run, the size of some TAR file might be smaller than the maximum 
> size allowed for a TAR file. At the time this issue was created the default 
> maximum size of a TAR file is 256 MiB.
> If there are many small TAR files, it should be possible to merge them in 
> bigger files. This way, we can reduce the total number of TAR files in the 
> segment store, and thus the number of open file descriptors that Oak has to 
> maintain.
> A possible implementation for the merge operation is the following:
> # Sort the list of TAR files by size, ascending.
> # Pick TAR files for the sorted list until the sum of their sizes after the 
> merge is less than 256 MiB.
> # Merge the picked up files into a new TAR file and marked the picked up 
> files for deletion.
> # Continue picking up TAR files from the sorted list until the list is 
> exhausted or until it's only possible to pick a single TAR file.
> The merge process can run in a background thread but it is important that it 
> doesn't conflict with the cleanup operation, since merge and cleanup both 
> change the representation of TAR files on the file system. Two possible 
> solutions to avoid conflicts are:
> # Use a global lock for the whole set of TAR files.
> # Use a lock per TAR file. The cleanup and merge processes have to agree on 
> the order to use when acquiring the lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to