[ 
https://issues.apache.org/jira/browse/IMPALA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486661#comment-16486661
 ] 

Jim Apple edited comment on IMPALA-6311 at 5/23/18 3:49 AM:
------------------------------------------------------------

I'm not suggesting increasing the maximum, just decreasing the default FPP. So, 
if we have a maximum BF size of 16MB and we predict that a filter will have a 
cardinality of 100k, we allocate now 32kb for the filter, well below the max 
limit. If we increase the filter size to 128kb, the false positive rate will 
drop substantially, even though we're well below the max.


was (Author: jbapple):
I'm not suggesting increasing the maximum, just increasing the default FPP. So, 
if we have a maximum BF size of 16MB and we predict that a filter will have a 
cardinality of 100k, we allocate now 32kb for the filter, well below the max 
limit. If we increase the filter size to 128kb, the false positive rate will 
drop substantially, even though we're well below the max.

> Evaluate smaller FPP for Bloom filters
> --------------------------------------
>
>                 Key: IMPALA-6311
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6311
>             Project: IMPALA
>          Issue Type: Task
>          Components: Perf Investigation
>            Reporter: Jim Apple
>            Priority: Major
>
> The Bloom filters are created by estimating the NDV and then using the FPP of 
> 75% to get the right size for the filter. This is may be too high to be very 
> useful - if our filters are currently filtering more than 75% out, then it is 
> only because we are overestimating NDV.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to