Github user ron8hu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19783#discussion_r155963930
  
    --- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala
 ---
    @@ -359,7 +371,7 @@ class FilterEstimationSuite extends 
StatsEstimationTestBase {
       test("cbool > false") {
         validateEstimatedStats(
           Filter(GreaterThan(attrBool, Literal(false)), 
childStatsTestPlan(Seq(attrBool), 10L)),
    -      Seq(attrBool -> ColumnStat(distinctCount = 1, min = Some(true), max 
= Some(true),
    +      Seq(attrBool -> ColumnStat(distinctCount = 1, min = Some(false), max 
= Some(true),
    --- End diff --
    
    Agreed with wzhfy.  Today's logic is: for these 2 conditions, (column > x) 
and (column >= x), we set the min value to x.  We do not distinguish these 2 
cases.  This is because we do not know the exact next value larger than x if x 
is a continuous data type like double type.  We may do some special coding for 
discrete data types such as Boolean or integer.  But, as wzhfy said, it does 
not deserve the complexity.   


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to