[GitHub] spark pull request #11209: [SPARK-13325][SQL] Create a 64-bit hashcode expre...

hvanhovell Sun, 31 Jul 2016 03:33:27 -0700

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11209#discussion_r72902008
  
    --- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/HashBenchmark.scala ---
    @@ -119,11 +146,12 @@ object HashBenchmark {
           .add("map", mapOfInt)
           .add("mapOfMap", MapType(IntegerType, mapOfInt))
         /*
    -    Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
    +    Intel(R) Core(TM) i7-4750HQ CPU @ 2.00GHz
         Hash For map:                       Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
-------------------------------------------------------------------------------------------
    -    interpreted version                      1820 / 1861          0.0      
444347.2       1.0X
    -    codegen version                           205 /  223          0.0      
 49936.5       8.9X
    +    interpreted version                      1612 / 1618          0.0      
393553.4       1.0X
    +    codegen version                           149 /  150          0.0      
 36381.2      10.8X
    +    codegen version 64-bit                    144 /  145          0.0      
 35122.1      11.2X
          */
    --- End diff --
    
    @gatorsmile I get the following results, running the benchmark:
    ```
    [info] Running benchmark: Hash For map
    [info]   Running case: interpreted version
    [info]   Stopped after 28358 iterations, 2000 ms
    [info]   Running case: codegen version
    [info]   Stopped after 10 iterations, 2079 ms
    [info]   Running case: codegen version 64-bit
    [info]   Stopped after 13 iterations, 2133 ms
    [info] 
    [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
    [info] Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
    [info] 
    [info] Hash For map:                            Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
    [info] 
------------------------------------------------------------------------------------------------
    [info] interpreted version                              0 /    0         
61.8          16.2       1.0X
    [info] codegen version                                176 /  208          
0.0       42927.5       0.0X
    [info] codegen version 64-bit                         154 /  164          
0.0       37711.4       0.0X
    ```
    So the results are similar. 
    
    I expect that there is some JVM cleverness going on here; the JVM might be 
caching the `hashCode()` results for  `BaseGenericInternalRow.hashCode` or is 
really good at predicting what is going on. We added a warmup period to the 
benchmarking logic in order to make the results more reliable (see the lines 
like `Stopped after 28358 iterations, 2000 ms`) and this might trigger this 
behavior.
    
    It is by design that we call `BaseGenericInternalRow.hashCode` (I think the 
result of the `Hash` expression and this method should yield the same result). 
@cloud-fan could you explain why we did this?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #11209: [SPARK-13325][SQL] Create a 64-bit hashcode expre...

Reply via email to