[GitHub] [spark] imback82 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable

2020-07-12 Thread GitBox
imback82 commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-657297080 Thanks @c21! > Re POC - I feel overall approach looks good to me. But IMO I think we should do the coalesce/divide in physical plan rule, but not logical plan rule.

[GitHub] [spark] imback82 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable

2020-07-11 Thread GitBox
imback82 commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-657170656 > (3).We are seeing in production, coalescing might hurt the parallelism, if the number of buckets are too few. Another way to avoid shuffle and sort, is to split/divide the ta

[GitHub] [spark] imback82 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable

2020-06-19 Thread GitBox
imback82 commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-646476903 retest this please This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] imback82 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable

2020-06-18 Thread GitBox
imback82 commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-645825392 retest this please This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] imback82 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable

2020-06-13 Thread GitBox
imback82 commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643582357 retest this please This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] imback82 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable

2020-06-12 Thread GitBox
imback82 commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643570195 retest this please This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] imback82 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable

2020-06-12 Thread GitBox
imback82 commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643564022 retest this please This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] imback82 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable

2020-06-11 Thread GitBox
imback82 commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-643047878 Here are some numbers when I joined two tables (store_sales from TPC-DS - 100 SF) and did `count` on it. It's run on 8 executors (8 cores each) and generates about 47GB of shuf

[GitHub] [spark] imback82 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable

2020-06-10 Thread GitBox
imback82 commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-642194428 > it will be good to see benchmark numbers of a typical bucket join that can benefit from this patch. I will try to get some numbers this week.