uros-db commented on code in PR #51521:
URL: https://github.com/apache/spark/pull/51521#discussion_r2217228239
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala:
##########
@@ -545,11 +580,32 @@ abstract class InterpretedHashFunction {
protected def hashUnsafeBytes(base: AnyRef, offset: Long, length: Int, seed:
Long): Long
+ private lazy val legacyCollationAwareHashing: Boolean =
+ SQLConf.get.getConf(SQLConf.COLLATION_AWARE_HASHING_ENABLED)
+
/**
- * Computes hash of a given `value` of type `dataType`. The caller needs to
check the validity
- * of input `value`.
+ * This method is intended for callers using the old hash API and preserves
compatibility for
+ * supported data types. It must only be used for data types that do not
include collated strings
+ * or complex types (e.g., structs, arrays, maps) that may contain collated
strings.
+ *
+ * The caller is responsible for ensuring that `dataType` does not involve
collation-aware fields.
+ * This is validated via an internal assertion.
+ *
+ * @throws IllegalArgumentException if `dataType` contains non-UTF8 binary
collation.
*/
def hash(value: Any, dataType: DataType, seed: Long): Long = {
+ require(!SchemaUtils.hasNonUTF8BinaryCollation(dataType))
+ // For UTF8_BINARY, hashing behavior is the same regardless of the
isCollationAware flag.
+ hash(value = value, dataType = dataType, seed = seed, isCollationAware =
false)
+ }
Review Comment:
Thank you Milan!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]