Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/11209#discussion_r72902008
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/HashBenchmark.scala ---
@@ -119,11 +146,12 @@ object HashBenchmark {
.add("map", mapOfInt)
.add("mapOfMap", MapType(IntegerType, mapOfInt))
/*
- Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
+ Intel(R) Core(TM) i7-4750HQ CPU @ 2.00GHz
Hash For map: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
-------------------------------------------------------------------------------------------
- interpreted version 1820 / 1861 0.0
444347.2 1.0X
- codegen version 205 / 223 0.0
49936.5 8.9X
+ interpreted version 1612 / 1618 0.0
393553.4 1.0X
+ codegen version 149 / 150 0.0
36381.2 10.8X
+ codegen version 64-bit 144 / 145 0.0
35122.1 11.2X
*/
--- End diff --
@gatorsmile I get the following results, running the benchmark:
```
[info] Running benchmark: Hash For map
[info] Running case: interpreted version
[info] Stopped after 28358 iterations, 2000 ms
[info] Running case: codegen version
[info] Stopped after 10 iterations, 2079 ms
[info] Running case: codegen version 64-bit
[info] Stopped after 13 iterations, 2133 ms
[info]
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
[info] Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
[info]
[info] Hash For map: Best/Avg Time(ms)
Rate(M/s) Per Row(ns) Relative
[info]
------------------------------------------------------------------------------------------------
[info] interpreted version 0 / 0
61.8 16.2 1.0X
[info] codegen version 176 / 208
0.0 42927.5 0.0X
[info] codegen version 64-bit 154 / 164
0.0 37711.4 0.0X
```
So the results are similar.
I expect that there is some JVM cleverness going on here; the JVM might be
caching the `hashCode()` results for `BaseGenericInternalRow.hashCode` or is
really good at predicting what is going on. We added a warmup period to the
benchmarking logic in order to make the results more reliable (see the lines
like `Stopped after 28358 iterations, 2000 ms`) and this might trigger this
behavior.
It is by design that we call `BaseGenericInternalRow.hashCode` (I think the
result of the `Hash` expression and this method should yield the same result).
@cloud-fan could you explain why we did this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]