Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19479#discussion_r149849510
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
---
@@ -275,6 +317,122 @@ object ColumnStat extends Logging {
avgLen = row.getLong(4),
maxLen = row.getLong(5)
)
+ if (row.isNullAt(6)) {
+ cs
+ } else {
+ val ndvs = row.getArray(6).toLongArray()
+ assert(percentiles.get.numElements() == ndvs.length + 1)
+ val endpoints =
percentiles.get.toArray[Any](attr.dataType).map(_.toString.toDouble)
--- End diff --
is it safe to cast decimal to double and use it as bucket boundary?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]