[GitHub] spark pull request #20913: [SPARK-23799] FilterEstimation.evaluateInSet prod...

maropu Tue, 03 Apr 2018 23:28:42 -0700

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20913#discussion_r179037665
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
    @@ -427,7 +427,11 @@ case class FilterEstimation(plan: Filter) extends 
Logging {
     
         // return the filter selectivity.  Without advanced statistics such as 
histograms,
         // we have to assume uniform distribution.
    -    Some(math.min(newNdv.toDouble / ndv.toDouble, 1.0))
    +    if (ndv.toDouble != 0) {
    --- End diff --
    
    What's the concrete case when `ndv.toDouble == 0`?
    Also, is this only an place where we need this check? 
    For example, we don't here:
    
https://github.com/apache/spark/blob/5cfd5fabcdbd77a806b98a6dd59b02772d2f6dee/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala#L166



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20913: [SPARK-23799] FilterEstimation.evaluateInSet prod...

Reply via email to