[
https://issues.apache.org/jira/browse/OAK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393777#comment-15393777
]
Francesco Mari commented on OAK-4598:
-------------------------------------
I looked a bit into it and I think that both the lower number of entries can be
explained with how the binary references index and compaction interact with
each other.
When {{org.apache.jackrabbit.oak.segment.file.TarReader#sweep}} is called,
every non-reclaimed entry of the file is written to another file with a higher
generation. The binary references index, though, is left behind in the old file
and is not generated in the new one. When a file is swept the following
situations might occur:
# A new generation is not created because every segment in the file is not
created. The segments are gone and the binary references index too, so this
case doesn't affect the total count of external binary references.
# A new generation is created and some segments are not filtered out. This
means that some binary references that should have reported by the index are
now lost, since the new file will not have a valid binary references index.
This explains why there are less references than expected.
I still don't have an explanation for the higher number of binary references.
> Collection of references retrieves less when large number of blobs added
> ------------------------------------------------------------------------
>
> Key: OAK-4598
> URL: https://issues.apache.org/jira/browse/OAK-4598
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: segment-tar
> Reporter: Amit Jain
> Labels: datastore, gc
> Fix For: Segment Tar 0.0.8
>
>
> When large number of external blobs are added to the DataStore (50000) and a
> cycle of compaction executed then the reference collection logic only returns
> lesser number of blob references. It reports correct number of blob
> references when number of blobs added are less indicatingsome sort of
> overflow.
> Another related issue observed when testing with lesser number of blobs is
> that the references returned are double the amount expected, so maybe there
> should be some sort of de-duplication which should be added.
> Without compaction the blob references are returned correctly atleast till
> 100000 (ExternalBlobId#testNullBlobId)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)