GideonPotok commented on code in PR #46404:
URL: https://github.com/apache/spark/pull/46404#discussion_r1592889564


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala:
##########
@@ -70,20 +78,46 @@ case class Mode(
     buffer
   }
 
-  override def eval(buffer: OpenHashMap[AnyRef, Long]): Any = {
-    if (buffer.isEmpty) {
+  override def eval(buff: OpenHashMap[AnyRef, Long]): Any = {
+    if (buff.isEmpty) {
       return null
     }
+    val buffer = if (child.dataType.isInstanceOf[StringType] && 
collationEnabled) {
+      val modeMap = buff.foldLeft(
+        new TreeMap[org.apache.spark.unsafe.types.UTF8String, 
Long]()(Ordering.comparatorToOrdering(
+          CollationFactory.fetchCollation(collationId).comparator
+          )))
+      {
+        case (map, (key: String, count)) =>
+          map(org.apache.spark.unsafe.types.UTF8String.fromString(key)) =
+            
map.getOrElse(org.apache.spark.unsafe.types.UTF8String.fromString(key), 0L) + 
count
+          map
+        case (map, (key: UTF8String, count)) =>
+          map(key) = map.getOrElse(key, 0L) + count
+          map
+        case (_, _) =>
+          throw new IllegalArgumentException("Mode expects non-null string 
type.")

Review Comment:
   @uros-db Yep.
   
   The buffer should never contain a key that is null, because it is not added 
to the map during Mode.update(). The behavior this supports is that, according 
to the [docs](https://spark.apache.org/docs/latest/api/sql/index.html): "NULL 
values are ignored. If all the values are NULL, or there are 0 rows, returns 
NULL."
   
   The subtleties of all this are fresh in my mind from looking into it 
recently -- So just let me know if there is an aspect of this you need more 
details about and I am happy to explain.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to