Francesco Mari created OAK-4014:
-----------------------------------

             Summary: The segment store should merge small TAR files into 
bigger ones
                 Key: OAK-4014
                 URL: https://issues.apache.org/jira/browse/OAK-4014
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: segmentmk
            Reporter: Francesco Mari
            Assignee: Francesco Mari
             Fix For: 1.6


The cleanup process removes unused segments from TAR files and writes new 
generations of those TAR files without the removed segments.

In the long run, the size of some TAR file might be smaller than the maximum 
size allowed for a TAR file. At the time this issue was created the default 
maximum size of a TAR file is 256 MiB.

If there are many small TAR files, it should be possible to merge them in 
bigger files. This way, we can reduce the total number of TAR files in the 
segment store, and thus the number of open file descriptors that Oak has to 
maintain.

A possible implementation for the merge operation is the following:

# Sort the list of TAR files by size, ascending.
# Pick TAR files for the sorted list until the sum of their sizes after the 
merge is less than 256 MiB.
# Merge the picked up files into a new TAR file and marked the picked up files 
for deletion.
# Continue picking up TAR files from the sorted list until the list is 
exhausted or until it's only possible to pick a single TAR file.

The merge process can run in a background thread but it is important that it 
doesn't conflict with the cleanup operation, since merge and cleanup both 
change the representation of TAR files on the file system. Two possible 
solutions to avoid conflicts are:

# Use a global lock for the whole set of TAR files.
# Use a lock per TAR file. The cleanup and merge processes have to agree on the 
order to use when acquiring the lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to