[GitHub] spark pull request #15047: [SPARK-17495] [SQL] Add Hash capability semantica...

hvanhovell Wed, 28 Sep 2016 09:06:11 -0700

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15047#discussion_r80954301
  
    --- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/HashByteArrayBenchmark.scala 
---
    @@ -59,90 +59,110 @@ object HashByteArrayBenchmark {
           }
         }
     
    +    benchmark.addCase("HiveHasher") { _: Int =>
    +      for (_ <- 0L until iters) {
    +        var sum = 0L
    +        var i = 0
    +        while (i < numArrays) {
    +          sum += HiveHasher.hashUnsafeBytes(arrays(i), 
Platform.BYTE_ARRAY_OFFSET, length)
    +          i += 1
    +        }
    +      }
    +    }
    +
         benchmark.run()
       }
     
       def main(args: Array[String]): Unit = {
         /*
    -    Intel(R) Core(TM) i7-4750HQ CPU @ 2.00GHz
    -    Hash byte arrays with length 8:     Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    -    
-------------------------------------------------------------------------------------------
    -    Murmur3_x86_32                             11 /   12        185.1      
     5.4       1.0X
    -    xxHash 64-bit                              17 /   18        120.0      
     8.3       0.6X
    +    Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    +    Hash byte arrays with length 8:          Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
    +    
------------------------------------------------------------------------------------------------
    +    Murmur3_x86_32                                  11 /   12        198.9 
          5.0       1.0X
    +    xxHash 64-bit                                   16 /   19        130.1 
          7.7       0.7X
    +    HiveHasher                                       0 /    0     282254.6 
          0.0    1419.0X
    --- End diff --
    
    This look to good to be true :).... I think the JVM is eliminating dead 
code. We should do something with the sum variable, and see what happens in 
that case.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #15047: [SPARK-17495] [SQL] Add Hash capability semantica...

Reply via email to