[GitHub] spark pull request #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.ev...
Github user mshtelma commented on a diff in the pull request: https://github.com/apache/spark/pull/21147#discussion_r184376159 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -392,13 +392,13 @@ case class FilterEstimation(plan: Filter) extends Logging { val dataType = attr.dataType var newNdv = ndv -if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty) { - return Some(0.0) -} - // use [min, max] to filter the original hSet dataType match { case _: NumericType | BooleanType | DateType | TimestampType => +if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty) { --- End diff -- min/max can be None if the column contains only null values. This is exactly the case for my query. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.ev...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21147 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.ev...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/21147#discussion_r184308017 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -392,13 +392,13 @@ case class FilterEstimation(plan: Filter) extends Logging { val dataType = attr.dataType var newNdv = ndv -if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty) { - return Some(0.0) -} - // use [min, max] to filter the original hSet dataType match { case _: NumericType | BooleanType | DateType | TimestampType => +if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty) { --- End diff -- min/max could be None when the table is empty --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.ev...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21147#discussion_r183940087 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -392,13 +392,13 @@ case class FilterEstimation(plan: Filter) extends Logging { val dataType = attr.dataType var newNdv = ndv -if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty) { - return Some(0.0) -} - // use [min, max] to filter the original hSet dataType match { case _: NumericType | BooleanType | DateType | TimestampType => +if (ndv.toDouble == 0 || colStat.min.isEmpty || colStat.max.isEmpty) { --- End diff -- I think we always have max/min for integral type? cc @wzhfy --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.ev...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/21147 [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateInSet produces wrong stats for STRING ## What changes were proposed in this pull request? `colStat.min` AND `colStat.max` are empty for string type. Thus, `evaluateInSet` should not return zero when either `colStat.min` or `colStat.max`. ## How was this patch tested? Added a test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark cached Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21147.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21147 commit 9672f92dde505eada20d8102dcd845a5418d37c8 Author: gatorsmileDate: 2018-04-25T03:59:46Z fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org