GitHub user mgaido91 opened a pull request:
https://github.com/apache/spark/pull/22338
[SPARK-25317][CORE] Avoid perf regression in Murmur3 Hash on UTF8String
## What changes were proposed in this pull request?
SPARK-10399 introduced a performance regression on the hash computation for
UTF8String.
The regression can be evaluated with the code attached in the JIRA. That
code runs in about 120 us per method on my laptop (MacBook Pro 2.5 GHz Intel
Core i7, RAM 16 GB 1600 MHz DDR3) while the code from branch 2.3 takes on the
same machine about 45 us for me. After the PR, the code takes about 45 us on
the master branch too.
## How was this patch tested?
running the perf test from the JIRA
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mgaido91/spark SPARK-25317
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22338.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22338
----
commit 91adce590461dda885d88319a700a775e63f9ce6
Author: Marco Gaido <marcogaido91@...>
Date: 2018-09-04T15:02:07Z
[SPARK-25317][CORE] Avoid perf regression in Murmur3 Hash on UTF8String
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]