[
https://issues.apache.org/jira/browse/OAK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398940#comment-15398940
]
Michael Dürig commented on OAK-4598:
------------------------------------
>From the top of my head I don't see a cheap, scalable and effective way for
>deduplicating these references. We probably first need to understand the
>impact of this in a real world scenario.
> Collection of references retrieves less when large number of blobs added
> ------------------------------------------------------------------------
>
> Key: OAK-4598
> URL: https://issues.apache.org/jira/browse/OAK-4598
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: segment-tar
> Reporter: Amit Jain
> Assignee: Francesco Mari
> Labels: datastore, gc
> Fix For: Segment Tar 0.0.8
>
>
> When large number of external blobs are added to the DataStore (50000) and a
> cycle of compaction executed then the reference collection logic only returns
> lesser number of blob references. It reports correct number of blob
> references when number of blobs added are less indicatingsome sort of
> overflow.
> Another related issue observed when testing with lesser number of blobs is
> that the references returned are double the amount expected, so maybe there
> should be some sort of de-duplication which should be added.
> Without compaction the blob references are returned correctly atleast till
> 100000 (ExternalBlobId#testNullBlobId)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)