[GitHub] [spark] imback82 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable

GitBox Thu, 11 Jun 2020 20:42:04 -0700


imback82 commented on pull request #28123:
URL: https://github.com/apache/spark/pull/28123#issuecomment-643047878



   Here are some numbers when I joined two tables (store_sales from TPC-DS - 
100 SF) and did `count` on it. It's run on 8 executors (8 cores each) and 
generates about 47GB of shuffle.
   
   Bucket Size #1 | Bucket Size #2 | Coalesce On (ms) | Coalesce Off (ms) | 
Gain (%)
   -- | -- | -- | -- | --
   512 | 256 | 40495 | 48435 | 16.39310416
   512 | 128 | 42459 | 49597 | 14.39199952
   512 | 64 | 45760 | 48888 | 6.398298151
   256 | 128 | 41241 | 49034 | 15.8930538
   256 | 64 | 42902 | 51063 | 15.98221804
   128 | 64 | 44131 | 53192 | 17.03451647
   
   There is a modest 15% gain (for ratio up to 4), WDYT?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] imback82 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable

Reply via email to