huonw commented on a change in pull request #24019: [SPARK-27099][SQL] Add
'xxhash64' for hashing arbitrary columns to Long
URL: https://github.com/apache/spark/pull/24019#discussion_r265467298
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala
##########
@@ -2167,6 +2167,19 @@ object functions {
new Murmur3Hash(cols.map(_.expr))
}
+ /**
+ * Calculates the hash code of given columns using the 64-bit
+ * variant of the xxHash algorithm, and returns the result as a long
+ * column.
+ *
+ * @group misc_funcs
+ * @since 2.4.1
+ */
+ @scala.annotation.varargs
+ def xxhash64(cols: Column*): Column = withExpr {
Review comment:
The `hash` function doesn't currently have a `seed` argument either.
In any case, I asked about this on [email protected] ("[SQL] hash:
64-bits and seeding"), but didn't get any response to that part of my proposal
(just the xxhash bit). I think if there was one it would have to come first,
because the var args have to come last, something like the following?
```scala
def hash(seed: Int, cols: Column*): Column
// or, maybe, don't perpetuate the "bad"/non-specific name:
def murmur3(seed: Int, cols: Columns*): Column
```
```scala
def xxhash64(seed: Long, cols: Column*): Column
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]