Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/20913#discussion_r179037665
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
---
@@ -427,7 +427,11 @@ case class FilterEstimation(plan: Filter) extends
Logging {
// return the filter selectivity. Without advanced statistics such as
histograms,
// we have to assume uniform distribution.
- Some(math.min(newNdv.toDouble / ndv.toDouble, 1.0))
+ if (ndv.toDouble != 0) {
--- End diff --
What's the concrete case when `ndv.toDouble == 0`?
Also, is this only an place where we need this check?
For example, we don't here:
https://github.com/apache/spark/blob/5cfd5fabcdbd77a806b98a6dd59b02772d2f6dee/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala#L166
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]