Github user mshtelma commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20913#discussion_r179063928
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
    @@ -427,7 +427,11 @@ case class FilterEstimation(plan: Filter) extends 
Logging {
     
         // return the filter selectivity.  Without advanced statistics such as 
histograms,
         // we have to assume uniform distribution.
    -    Some(math.min(newNdv.toDouble / ndv.toDouble, 1.0))
    +    if (ndv.toDouble != 0) {
    --- End diff --
    
    I have experienced this problem for the sub condition with IN clause, smth 
like  "FLD in ("value")".
    To my mind,  this happens, if the table is empty. In this case ndv will be 
0. 
    I think, it will make sense, to check it everywhere it is used in this way. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to