[
https://issues.apache.org/jira/browse/OAK-11781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987885#comment-17987885
]
Thomas Mueller commented on OAK-11781:
--------------------------------------
Ah sorry about that... New PR with fixed test case
https://github.com/apache/jackrabbit-oak/pull/2364
> Binary reference statistics are inaccurate for very large repositories
> ----------------------------------------------------------------------
>
> Key: OAK-11781
> URL: https://issues.apache.org/jira/browse/OAK-11781
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Reporter: Thomas Mueller
> Assignee: Thomas Mueller
> Priority: Major
>
> The DistinctBinarySize report is inaccurate if there are more than around 16
> million binary references: right now the Bloom filter size is set to 16 MB,
> but this is not enough for some repositories and leads to a very high
> false-positive rate of around 95% (normal is 1%).
> It is quite easy to increase the memory size for the Bloom filter.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)