Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/15990
  
    @wzhfy I did a little of digging. I think we are fixing something that is 
not broken. The current implementation does not apply bias correction if you 
set the relative error lower than 0.22%. See this line the code: 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlus.scala#L301
    
    So I am fine with improving the error message, but not with capping the 'p' 
value.
    
    Also note that bias correction only applies to an intermediate range: all 
HLL++ registries have to be in use and the raw estimate is smaller then 
`num_registries * 5`. So this only applies to a subset of use cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to