Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19479#discussion_r149860437
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
---
@@ -275,6 +317,122 @@ object ColumnStat extends Logging {
avgLen = row.getLong(4),
maxLen = row.getLong(5)
)
+ if (row.isNullAt(6)) {
+ cs
+ } else {
+ val ndvs = row.getArray(6).toLongArray()
+ assert(percentiles.get.numElements() == ndvs.length + 1)
+ val endpoints =
percentiles.get.toArray[Any](attr.dataType).map(_.toString.toDouble)
--- End diff --
It's for estimation, so I think accuracy loss is acceptable. Double type
makes code a lot simpler in estimation logic.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]