[ 
https://issues.apache.org/jira/browse/OAK-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Mari updated OAK-4201:
--------------------------------
    Attachment: OAK-4201-01.patch

[^OAK-4201-01.patch] saves every external binary identifier in a separate entry 
in the TAR file. When collecting external binary references, is not needed to 
scan and read every segment anymore. For this reason, this patch also removes 
the list of external binary references from the segment header.

[~alex.parvulescu], [~mduerig], can you have a look at this patch? You can also 
follow this patch commit-by-commit 
[here|https://github.com/francescomari/jackrabbit-oak/commits/bid], every 
commit contains a more detailed description of what I changed and why.

> Add an index of binary references in a tar file
> -----------------------------------------------
>
>                 Key: OAK-4201
>                 URL: https://issues.apache.org/jira/browse/OAK-4201
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>            Reporter: Chetan Mehrotra
>            Assignee: Francesco Mari
>             Fix For: Segment Tar 0.0.4
>
>         Attachments: OAK-4201-01.patch
>
>
> Currently for  Blob GC in case of segment {{SegmentBlobReferenceRetriever}} 
> goes through all tar files and extracts the binary references. This has 2 
> issues
> # Logic has go through all the segments in all tar files
> # All segments get loaded in memory once which would affect normal system 
> performance
> This process can be optimized if we also write a file entry in tar (similar 
> to gph i.e. graph and idx i.e. index files) which has entries of all binary 
> references referred to in any segment present in that tar file. Then GC logic 
> would just have read this file and avoid scanning all the segments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to