[GitHub] spark pull request #20913: [SPARK-23799] FilterEstimation.evaluateInSet prod...

mshtelma Wed, 04 Apr 2018 01:37:53 -0700

Github user mshtelma commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20913#discussion_r179063928
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
    @@ -427,7 +427,11 @@ case class FilterEstimation(plan: Filter) extends 
Logging {
     
         // return the filter selectivity.  Without advanced statistics such as 
histograms,
         // we have to assume uniform distribution.
    -    Some(math.min(newNdv.toDouble / ndv.toDouble, 1.0))
    +    if (ndv.toDouble != 0) {
    --- End diff --
    
    I have experienced this problem for the sub condition with IN clause, smth 
like  "FLD in ("value")".
    To my mind,  this happens, if the table is empty. In this case ndv will be 
0. 
    I think, it will make sense, to check it everywhere it is used in this way.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20913: [SPARK-23799] FilterEstimation.evaluateInSet prod...

Reply via email to