mihailom-db commented on code in PR #47502:
URL: https://github.com/apache/spark/pull/47502#discussion_r1696409411
##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala:
##########
@@ -620,6 +620,30 @@ class HashExpressionsSuite extends SparkFunSuite with
ExpressionEvalHelper {
checkHiveHashForDecimal("123456.123456789012345678901234567890", 38, 31,
1728235666)
}
+ for (collation <- Seq("UTF8_LCASE", "UNICODE_CI", "UTF8_BINARY")) {
Review Comment:
nit: I believe we used to test 4 collations everywhere, UTF8_BINARY,
UTF8_LCASE, UNICODE and UNICODE_CI. These are kind of alternatives between
spark implementations and ICU.
##########
sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala:
##########
@@ -607,6 +607,8 @@ class CollationExpressionWalkerSuite extends SparkFunSuite
with SharedSparkSessi
// need to skip as plans differ in STRING <-> STRING COLLATE UTF8_LCASE
"current_timezone",
"schema_of_variant",
+ "hash",
+ "xxhash64",
Review Comment:
nit: Is this the right group for these expressions? Don't we expect results
to differ, as hashes are differnet?
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala:
##########
@@ -565,7 +565,15 @@ abstract class InterpretedHashFunction {
case a: Array[Byte] =>
hashUnsafeBytes(a, Platform.BYTE_ARRAY_OFFSET, a.length, seed)
case s: UTF8String =>
- hashUnsafeBytes(s.getBaseObject, s.getBaseOffset, s.numBytes(), seed)
+ val st = dataType.asInstanceOf[StringType]
+ if (st.supportsBinaryEquality) {
+ hashUnsafeBytes(s.getBaseObject, s.getBaseOffset, s.numBytes(), seed)
+ } else {
+ val stringHash = CollationFactory
+ .fetchCollation(dataType.asInstanceOf[StringType].collationId)
Review Comment:
```suggestion
.fetchCollation(st.collationId)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]