Github user ron8hu commented on a diff in the pull request:
https://github.com/apache/spark/pull/19783#discussion_r154252063
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
---
@@ -513,10 +560,9 @@ case class FilterEstimation(plan: Filter) extends
Logging {
op match {
case _: GreaterThan | _: GreaterThanOrEqual =>
- // If new ndv is 1, then new max must be equal to new min.
- newMin = if (newNdv == 1) newMax else newValue
+ newMin = newValue
case _: LessThan | _: LessThanOrEqual =>
- newMax = if (newNdv == 1) newMin else newValue
+ newMax = newValue
--- End diff --
Previously I coded that way because of a corner test case: test("cbool >
false"). At that time, I set the newMin to newMax since newNdv = 1. However,
this logic does not work well for the skewed distribution test case: test
("cintHgm < 3"). In this test, newMin=1 newMax=3. I think the revised code
makes better sense.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]