[ https://issues.apache.org/jira/browse/OAK-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Francesco Mari updated OAK-4201: -------------------------------- Attachment: OAK-4201-01.patch [^OAK-4201-01.patch] saves every external binary identifier in a separate entry in the TAR file. When collecting external binary references, is not needed to scan and read every segment anymore. For this reason, this patch also removes the list of external binary references from the segment header. [~alex.parvulescu], [~mduerig], can you have a look at this patch? You can also follow this patch commit-by-commit [here|https://github.com/francescomari/jackrabbit-oak/commits/bid], every commit contains a more detailed description of what I changed and why. > Add an index of binary references in a tar file > ----------------------------------------------- > > Key: OAK-4201 > URL: https://issues.apache.org/jira/browse/OAK-4201 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segment-tar > Reporter: Chetan Mehrotra > Assignee: Francesco Mari > Fix For: Segment Tar 0.0.4 > > Attachments: OAK-4201-01.patch > > > Currently for Blob GC in case of segment {{SegmentBlobReferenceRetriever}} > goes through all tar files and extracts the binary references. This has 2 > issues > # Logic has go through all the segments in all tar files > # All segments get loaded in memory once which would affect normal system > performance > This process can be optimized if we also write a file entry in tar (similar > to gph i.e. graph and idx i.e. index files) which has entries of all binary > references referred to in any segment present in that tar file. Then GC logic > would just have read this file and avoid scanning all the segments -- This message was sent by Atlassian JIRA (v6.3.4#6332)