GitHub user sameeragarwal opened a pull request:

    https://github.com/apache/spark/pull/12379

    [SPARK-14620][SQL][WIP] Use/benchmark a better hash in AggregateHashMap

    ## What changes were proposed in this pull request?
    
    This PR uses a better hashing algorithm while probing the AggregateHashMap:
    
    ```java
    long h = 0
    h = (h << 5) - h + key_1
    h = (h << 5) - h + key_2 
    ...
    h = (h << 5) - h + key_n 
    return h
    ```
    
    Depends on: https://github.com/apache/spark/pull/12345
    ## How was this patch tested?
    
        Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02 on Mac OS X 10.11.4
        Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
        Aggregate w keys:                   Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
        
-------------------------------------------------------------------------------------------
        codegen = F                              2417 / 2457          8.7       
  115.2       1.0X
        codegen = T hashmap = F                  1554 / 1581         13.5       
   74.1       1.6X
        codegen = T hashmap = T                   877 /  929         23.9       
   41.8       2.8X

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sameeragarwal/spark hash

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12379.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12379
    
----
commit 7c158bd137f057453d17ef360906e5be90bf5004
Author: Sameer Agarwal <[email protected]>
Date:   2016-03-31T21:15:34Z

    [SPARK-14394]

commit ebaea6a87b704afedd47bdd2dd17c92c3ffc6e8e
Author: Sameer Agarwal <[email protected]>
Date:   2016-04-07T00:37:08Z

    Integrating AggregateHashMap for Aggregates with Group By

commit cee7e65b3cf7569b4e46941158f164c2130c3981
Author: Sameer Agarwal <[email protected]>
Date:   2016-04-12T17:33:42Z

    Add SQLConf

commit 8c9e17a1d40e3014e39b1d04f3a458aa129784f8
Author: Sameer Agarwal <[email protected]>
Date:   2016-04-12T23:01:03Z

    20ns

commit 3379294b76d91a55dbe86e31efb9812c8d37768c
Author: Sameer Agarwal <[email protected]>
Date:   2016-04-12T23:18:36Z

    generated code

commit 4ee56873764d62efdaf8c47cb74aa399f2194fde
Author: Sameer Agarwal <[email protected]>
Date:   2016-04-13T01:23:27Z

    benchmark

commit c2fc38584dd073036a1f04f7cd7da9fcf50739e8
Author: Sameer Agarwal <[email protected]>
Date:   2016-04-13T01:58:01Z

    fix comment

commit fc6b8cb337e11d3d92f0f13828b8a0b85e30929c
Author: Sameer Agarwal <[email protected]>
Date:   2016-04-13T18:48:56Z

    enable conf by default for testing

commit ececd5770e0c5410cf3da463f3b52db467d1f5ca
Author: Sameer Agarwal <[email protected]>
Date:   2016-04-13T22:12:16Z

    fix tests

commit 0ca0db17130bb6a1e59cdbc699f5abd946821d44
Author: Sameer Agarwal <[email protected]>
Date:   2016-04-13T22:31:24Z

    review comments

commit 555bcd2c9ef31818141f1d1434d17f42b8ff8acb
Author: Sameer Agarwal <[email protected]>
Date:   2016-04-14T00:02:50Z

    hash

commit bbb966338a311dd2bf4b0dd7194bd29ebb04ce48
Author: Sameer Agarwal <[email protected]>
Date:   2016-04-14T01:27:41Z

    hash function

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to