GideonPotok commented on code in PR #46404:
URL: https://github.com/apache/spark/pull/46404#discussion_r1593161711


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala:
##########
@@ -70,20 +78,46 @@ case class Mode(
     buffer
   }
 
-  override def eval(buffer: OpenHashMap[AnyRef, Long]): Any = {
-    if (buffer.isEmpty) {
+  override def eval(buff: OpenHashMap[AnyRef, Long]): Any = {
+    if (buff.isEmpty) {
       return null
     }
+    val buffer = if (child.dataType.isInstanceOf[StringType] && 
collationEnabled) {
+      val modeMap = buff.foldLeft(
+        new TreeMap[org.apache.spark.unsafe.types.UTF8String, 
Long]()(Ordering.comparatorToOrdering(
+          CollationFactory.fetchCollation(collationId).comparator

Review Comment:
   @uros-db `org.apache.spark.util.collection.OpenHashMap` is actually a spark 
data structure, so I could try to modify it in a second candidate PR and then 
see about having `Mode` rely on a `OpenHashMap` configured to be collation 
aware..
   
   The way I could  do it would be to modify   
`org.apache.spark.util.collection.OpenHashSet.Hasher.hash()` , specifically the 
override of hash within a class we would create `Hasher[String with 
Collation...]`, to branch to an alternative hashing method that is collation 
sensitive. 
   
   (I would not be modifying 
`org.apache.spark.util.collection.OpenHashSet#hashcode` , which distributes the 
hashes more evenly and is called on values returned by `Hasher.hash`) 
   
   This should work because `OpenHashMap` relies internally on `OpenHashSet` 
which uses `org.apache.spark.util.collection.OpenHashSet.Hasher`
   
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to