[GitHub] spark issue #15990: [SPARK-18559] [SQL] Restrict the lower bound of relative...

hvanhovell Wed, 23 Nov 2016 06:29:53 -0800

Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/15990
  
    It should regress to the original HLL result and use the small range 
correction they use. Something like this:
    ```scala
    // We integrate two steps from the paper:
        // val Z = 1.0d / zInverse
        // val E = alphaM2 * Z
        val E = alphaM2 / zInverse
        @inline
        def EBiasCorrected = E match {
          case e if p < 19 && e < 5.0d * m => e - estimateBias(e)
          case e => e
        }
    
        // Estimate the cardinality.
        val estimate = if (V > 0) {
          // Use linear counting for small cardinality estimates.
          val H = m * Math.log(m / V)
          if (p < 19 && H <= THRESHOLDS(p - 4)) {
            H
          } else if (E <= 2.5 * m) {
            H
          } else {
            EBiasCorrected
          }
        } else {
          EBiasCorrected
        }
    ```
    
    I don't think we should start throwing errors for things that used to work.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #15990: [SPARK-18559] [SQL] Restrict the lower bound of relative...

Reply via email to