Github user mshtelma commented on a diff in the pull request:
https://github.com/apache/spark/pull/20913#discussion_r179063928
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
---
@@ -427,7 +427,11 @@ case class FilterEstimation(plan: Filter) extends
Logging {
// return the filter selectivity. Without advanced statistics such as
histograms,
// we have to assume uniform distribution.
- Some(math.min(newNdv.toDouble / ndv.toDouble, 1.0))
+ if (ndv.toDouble != 0) {
--- End diff --
I have experienced this problem for the sub condition with IN clause, smth
like "FLD in ("value")".
To my mind, this happens, if the table is empty. In this case ndv will be
0.
I think, it will make sense, to check it everywhere it is used in this way.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]