Re: [PR] [SPARK-47268][SQL][Collations] Support for repartition with collations [spark]

via GitHub Mon, 04 Mar 2024 03:28:34 -0800


dbatomic commented on code in PR #45370:
URL: https://github.com/apache/spark/pull/45370#discussion_r1511010521



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala:
##########
@@ -788,11 +799,21 @@ case class HiveHash(children: Seq[Expression]) extends 
HashExpression[Int] {
       $result = (int) 
${HiveHashFunction.getClass.getName.stripSuffix("$")}.hashTimestamp($input);
      """
 
-  override protected def genHashString(input: String, result: String): String 
= {
-    val baseObject = s"$input.getBaseObject()"
-    val baseOffset = s"$input.getBaseOffset()"
-    val numBytes = s"$input.numBytes()"
-    s"$result = $hasherClassName.hashUnsafeBytes($baseObject, $baseOffset, 
$numBytes);"
+  override protected def genHashString(

Review Comment:
   @cloud-fan I don't really understand the need for both `HiveHash` and 
default implementation in `HashExpression`. The diff seems to be that previous 
result is fed into hasher as seed for `HashExpression` but not for `Hive`, 
although I do not understand the reasoning for this.
   
   I tried to mimic behaviour for collations but I will need detailed review 
from you on this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47268][SQL][Collations] Support for repartition with collations [spark]

Reply via email to