[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...

hvanhovell Wed, 14 Oct 2015 14:36:21 -0700

Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/8362#issuecomment-148209375
  
    @yhuai It doesn't. A 64-bit hashcode is recommended though, especially when 
would want to approximate a billion or more unique values. I have used the 
ClearSpring hashcode, because this enabled me to compare the results of my 
HLL++ implementation to theirs.
    
    We could replace it with another, better performing, one; don't we have one 
in Spark? We could also scale down to 32-bits...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...

Reply via email to