[ https://issues.apache.org/jira/browse/OAK-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Francesco Mari resolved OAK-4014. --------------------------------- Resolution: Later Fix Version/s: (was: Segment Tar 0.0.16) > The segment store should merge small TAR files into bigger ones > --------------------------------------------------------------- > > Key: OAK-4014 > URL: https://issues.apache.org/jira/browse/OAK-4014 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar > Reporter: Francesco Mari > Assignee: Francesco Mari > Attachments: tar_sizes.m, tar_sizes.png > > > The cleanup process removes unused segments from TAR files and writes new > generations of those TAR files without the removed segments. > In the long run, the size of some TAR file might be smaller than the maximum > size allowed for a TAR file. At the time this issue was created the default > maximum size of a TAR file is 256 MiB. > If there are many small TAR files, it should be possible to merge them in > bigger files. This way, we can reduce the total number of TAR files in the > segment store, and thus the number of open file descriptors that Oak has to > maintain. > A possible implementation for the merge operation is the following: > # Sort the list of TAR files by size, ascending. > # Pick TAR files for the sorted list until the sum of their sizes after the > merge is less than 256 MiB. > # Merge the picked up files into a new TAR file and marked the picked up > files for deletion. > # Continue picking up TAR files from the sorted list until the list is > exhausted or until it's only possible to pick a single TAR file. > The merge process can run in a background thread but it is important that it > doesn't conflict with the cleanup operation, since merge and cleanup both > change the representation of TAR files on the file system. Two possible > solutions to avoid conflicts are: > # Use a global lock for the whole set of TAR files. > # Use a lock per TAR file. The cleanup and merge processes have to agree on > the order to use when acquiring the lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)