Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17446#discussion_r108336824 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -343,6 +347,26 @@ case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Lo } /** + * Returns a percentage of rows meeting a Literal expression. + * This method evaluates all the possible literal cases in Filter. + * + * FalseLiteral and TrueLiteral should be eliminated by optimizer, but null literal might be added + * by optimizer rule NullPropagation. For safety, we handle all the cases here. + * + * @param literal a literal value (or constant) + * @return an optional double value to show the percentage of rows meeting a given condition + */ + def evaluateLiteral(literal: Literal): Option[Double] = { + literal match { + case Literal(null, _) => Some(0.0) --- End diff -- handling `null` in filter estimation is not trivial, e.g. `null and false` returns false, `null and true` returns true. If we estimate `cond && null`, we will report 0 selectivity, which is wrong. I think we should eliminate null literal in optimizer when it's involved in filter condition.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org