Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17446#discussion_r108336824
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
    @@ -343,6 +347,26 @@ case class FilterEstimation(plan: Filter, 
catalystConf: CatalystConf) extends Lo
       }
     
       /**
    +   * Returns a percentage of rows meeting a Literal expression.
    +   * This method evaluates all the possible literal cases in Filter.
    +   *
    +   * FalseLiteral and TrueLiteral should be eliminated by optimizer, but 
null literal might be added
    +   * by optimizer rule NullPropagation. For safety, we handle all the 
cases here.
    +   *
    +   * @param literal a literal value (or constant)
    +   * @return an optional double value to show the percentage of rows 
meeting a given condition
    +   */
    +  def evaluateLiteral(literal: Literal): Option[Double] = {
    +    literal match {
    +      case Literal(null, _) => Some(0.0)
    --- End diff --
    
    handling `null` in filter estimation is not trivial, e.g. `null and false` 
returns false, `null and true` returns true. If we estimate `cond && null`, we 
will report 0 selectivity, which is wrong.
    
    I think we should eliminate null literal in optimizer when it's involved in 
filter condition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to