Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/15090 To help us choose a better design, we need to first clarify the usage of column stats. A simple example may look like this (e.g. predicate: col < 5): ```java filter.condition match { case LessThan(ar: AttributeReference, Literal(value, _)) => if (filter.statistics.colStats.contains(ar.name)) { val colStat = filter.statistics.colStats(ar.name) val estimatedRowCount = ar.dataType match { case _: IntegralType => val longColStat = colStat.forNumeric[Long] val longValue = value.toString.toLong if (longColStat.max < longValue) { // all records satisfy the filter condition filter.child.statistics.rowCount } else if (longColStat.min >= longValue) { // none of the records satisfy the filter condition 0 } else { // do detailed estimation (using histogram) ... } case FloatType | DoubleType => ... case DecimalType() => ... } } } ```
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org