[GitHub] spark pull request #21052: [SPARK-23799] FilterEstimation.evaluateInSet prod...

wzhfy Fri, 13 Apr 2018 06:08:55 -0700

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21052#discussion_r181381874
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
    @@ -395,27 +395,28 @@ case class FilterEstimation(plan: Filter) extends 
Logging {
         // use [min, max] to filter the original hSet
         dataType match {
           case _: NumericType | BooleanType | DateType | TimestampType =>
    -        val statsInterval =
    -          ValueInterval(colStat.min, colStat.max, 
dataType).asInstanceOf[NumericValueInterval]
    -        val validQuerySet = hSet.filter { v =>
    -          v != null && statsInterval.contains(Literal(v, dataType))
    -        }
    +        if (colStat.min.isDefined && colStat.max.isDefined) {
    --- End diff --
    
    check `ndv == 0` at the beginning and return `Some(0.0`? then we don't have 
to make all these changes



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21052: [SPARK-23799] FilterEstimation.evaluateInSet prod...

Reply via email to