GideonPotok commented on code in PR #46404:
URL: https://github.com/apache/spark/pull/46404#discussion_r1592889564
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala:
##########
@@ -70,20 +78,46 @@ case class Mode(
buffer
}
- override def eval(buffer: OpenHashMap[AnyRef, Long]): Any = {
- if (buffer.isEmpty) {
+ override def eval(buff: OpenHashMap[AnyRef, Long]): Any = {
+ if (buff.isEmpty) {
return null
}
+ val buffer = if (child.dataType.isInstanceOf[StringType] &&
collationEnabled) {
+ val modeMap = buff.foldLeft(
+ new TreeMap[org.apache.spark.unsafe.types.UTF8String,
Long]()(Ordering.comparatorToOrdering(
+ CollationFactory.fetchCollation(collationId).comparator
+ )))
+ {
+ case (map, (key: String, count)) =>
+ map(org.apache.spark.unsafe.types.UTF8String.fromString(key)) =
+
map.getOrElse(org.apache.spark.unsafe.types.UTF8String.fromString(key), 0L) +
count
+ map
+ case (map, (key: UTF8String, count)) =>
+ map(key) = map.getOrElse(key, 0L) + count
+ map
+ case (_, _) =>
+ throw new IllegalArgumentException("Mode expects non-null string
type.")
Review Comment:
@uros-db Yep.
The buffer should never contain a key that is null, because it is not added
to the map during Mode.update(). The behavior this supports is that, according
to the [docs](https://spark.apache.org/docs/latest/api/sql/index.html): "NULL
values are ignored. If all the values are NULL, or there are 0 rows, returns
NULL."
The subtleties of all this are fresh in my mind from looking into it
recently -- So just let me know if there is an aspect of this you need more
details about and I am happy to explain.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]