Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19479#discussion_r148234308
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
---
@@ -89,19 +93,158 @@ case class AnalyzeColumnCommand(
// The first element in the result will be the overall row count, the
following elements
// will be structs containing all column stats.
// The layout of each struct follows the layout of the ColumnStats.
- val ndvMaxErr = sparkSession.sessionState.conf.ndvMaxError
val expressions = Count(Literal(1)).toAggregateExpression() +:
- attributesToAnalyze.map(ColumnStat.statExprs(_, ndvMaxErr))
+ attributesToAnalyze.map(statExprs(_, sparkSession.sessionState.conf))
--- End diff --
My feeling is that, we should run a job before calling `statExprs`, because
`statExprs` need some extra information about buckets. I think it's better than
hiding this job deep in `rowToColumnStats`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]