Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/15047#discussion_r80954301
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/HashByteArrayBenchmark.scala
---
@@ -59,90 +59,110 @@ object HashByteArrayBenchmark {
}
}
+ benchmark.addCase("HiveHasher") { _: Int =>
+ for (_ <- 0L until iters) {
+ var sum = 0L
+ var i = 0
+ while (i < numArrays) {
+ sum += HiveHasher.hashUnsafeBytes(arrays(i),
Platform.BYTE_ARRAY_OFFSET, length)
+ i += 1
+ }
+ }
+ }
+
benchmark.run()
}
def main(args: Array[String]): Unit = {
/*
- Intel(R) Core(TM) i7-4750HQ CPU @ 2.00GHz
- Hash byte arrays with length 8: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
-
-------------------------------------------------------------------------------------------
- Murmur3_x86_32 11 / 12 185.1
5.4 1.0X
- xxHash 64-bit 17 / 18 120.0
8.3 0.6X
+ Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
+ Hash byte arrays with length 8: Best/Avg Time(ms)
Rate(M/s) Per Row(ns) Relative
+
------------------------------------------------------------------------------------------------
+ Murmur3_x86_32 11 / 12 198.9
5.0 1.0X
+ xxHash 64-bit 16 / 19 130.1
7.7 0.7X
+ HiveHasher 0 / 0 282254.6
0.0 1419.0X
--- End diff --
This look to good to be true :).... I think the JVM is eliminating dead
code. We should do something with the sum variable, and see what happens in
that case.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]