GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/10917

    [SPARK-12888][SQL][follow-up] benchmark the new hash expression

    Adds the benchmark results as comments.
    
    The codegen version is slower than the interpreted version for `simple` 
case becasue of 3 reasons:
    
    1. codegen version use a more complex hash algorithm than interpreted 
version, i.e. `Murmur3_x86_32.hashInt` vs [simple multiplication and 
addition](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala#L153).
    2. codegen version will write the hash value to a row first and then read 
it out. I tried to create a `GenerateHasher` that can generate code to return 
hash value directly and got about 60% speed up for the `simple` case, does it 
worth?
    3. the row in `simple` case only has one int field, so the runtime 
reflection may be removed because of branch prediction, which makes the 
interpreted version faster.
    
    The `array` case is also slow for similar reasons, e.g. array elements are 
of same type, so interpreted version can probably get rid of runtime reflection 
by branch prediction.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark hash-benchmark

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10917.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10917
    
----
commit 8207dc109f21527438cbd80894e9b49d63159f12
Author: Wenchen Fan <[email protected]>
Date:   2016-01-26T02:24:38Z

    add benchmark results

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to