Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/15090
  
    To help us choose a better design, we need to first clarify the usage of 
column stats.
    A simple example may look like this (e.g. predicate: col < 5):
    ```java
      filter.condition match {
        case LessThan(ar: AttributeReference, Literal(value, _)) =>
          if (filter.statistics.colStats.contains(ar.name)) {
            val colStat = filter.statistics.colStats(ar.name)
            val estimatedRowCount = ar.dataType match {
              case _: IntegralType =>
                val longColStat = colStat.forNumeric[Long]
                val longValue = value.toString.toLong
                if (longColStat.max < longValue) {
                  // all records satisfy the filter condition
                  filter.child.statistics.rowCount              
                } else if (longColStat.min >= longValue) {
                  // none of the records satisfy the filter condition
                  0
                } else {
                  // do detailed estimation (using histogram)
                  ...
                }
              case FloatType | DoubleType =>
                ...
              case DecimalType() =>
                ...
            }
          }
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to