Github user hvanhovell commented on the pull request:
https://github.com/apache/spark/pull/8362#issuecomment-148209375
@yhuai It doesn't. A 64-bit hashcode is recommended though, especially when
would want to approximate a billion or more unique values. I have used the
ClearSpring hashcode, because this enabled me to compare the results of my
HLL++ implementation to theirs.
We could replace it with another, better performing, one; don't we have one
in Spark? We could also scale down to 32-bits...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]