[ 
https://issues.apache.org/jira/browse/OAK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393777#comment-15393777
 ] 

Francesco Mari commented on OAK-4598:
-------------------------------------

I looked a bit into it and I think that both the lower number of entries can be 
explained with how the binary references index and compaction interact with 
each other.

When {{org.apache.jackrabbit.oak.segment.file.TarReader#sweep}} is called, 
every non-reclaimed entry of the file is written to another file with a higher 
generation. The binary references index, though, is left behind in the old file 
and is not generated in the new one. When a file is swept the following 
situations might occur:

# A new generation is not created because every segment in the file is not 
created. The segments are gone and the binary references index too, so this 
case doesn't affect the total count of external binary references.
# A new generation is created and some segments are not filtered out. This 
means that some binary references that should have reported by the index are 
now lost, since the new file will not have a valid binary references index. 
This explains why there are less references than expected.

I still don't have an explanation for the higher number of binary references.

> Collection of references retrieves less when large number of blobs added
> ------------------------------------------------------------------------
>
>                 Key: OAK-4598
>                 URL: https://issues.apache.org/jira/browse/OAK-4598
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar
>            Reporter: Amit Jain
>              Labels: datastore, gc
>             Fix For: Segment Tar 0.0.8
>
>
> When large number of external blobs are added to the DataStore (50000) and a 
> cycle of compaction executed then the reference collection logic only returns 
> lesser number of blob references. It reports correct number of blob 
> references when number of blobs added are less indicatingsome sort of 
> overflow.
> Another related issue observed when testing with lesser number of blobs is 
> that the references returned are double the amount expected, so maybe there 
> should be some sort of de-duplication which should be added.
> Without compaction the blob references are returned correctly atleast till 
> 100000 (ExternalBlobId#testNullBlobId)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to