Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19506#discussion_r145741653
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala
---
@@ -153,13 +129,14 @@ case class ApproxCountDistinctForIntervals(
// endpoints are sorted into ascending order already
if (endpoints.head > doubleValue || endpoints.last < doubleValue) {
// ignore if the value is out of the whole range
- return
+ return buffer
}
val hllppIndex = findHllppIndex(doubleValue)
- val offset = mutableAggBufferOffset + hllppIndex * numWordsPerHllpp
- hllppArray(hllppIndex).update(buffer, offset, value, child.dataType)
+ val offset = hllppIndex * numWordsPerHllpp
+ hllppArray(hllppIndex).update(LongArrayInput(buffer), offset, value,
child.dataType)
--- End diff --
you can just pass `InternalRow(buffer)` here, to save a lot of code
changes. If performance matters here, you can create a `LongArrayInternalRow`
to avoid boxing.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]