c21 commented on pull request #29342:
URL: https://github.com/apache/spark/pull/29342#issuecomment-672640787
@cloud-fan, @agrawaldevesh, @maropu and @viirya - updated the PR with latest
proposed change (I still need to add unit test for `BytesToBytesMap` and
`HashedRelation`, but the added unit test in `JoinSuite` should give us enough
confidence for end-to-end working now. Would like to get feedback first before
spending more time crafting more unit tests, thanks).
Tested with the same example small benchmark query in PR description, still
seeing 30% wall clock time improvement compared to sort merge join (I agree
this is much more a toy benchmark query, but it should give us some confidence
that we are not doing some very wrong thing here in terms of performance):
```
Running benchmark: shuffle hash join
Running case: shuffle hash join off
Stopped after 2 iterations, 16602 ms
Running case: shuffle hash join on
Stopped after 5 iterations, 31911 ms
Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.15.4
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
shuffle hash join: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
shuffle hash join off 7900 8301
567 2.1 470.9 1.0X
shuffle hash join on 6250 6382
95 2.7 372.5 1.3X
```
Also running added unit test in `JoinSuite`. Verified all new added logic
inside `ShuffledHashJoin` is 100% code covered (the not covered ones are
related to code-gen, which is irrelevant here):
<img width="1664" alt="Screen Shot 2020-08-11 at 11 10 21 PM"
src="https://user-images.githubusercontent.com/4629931/89983056-bebd4180-dc2b-11ea-9fe3-cdf06143a002.png">
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]