Re: SortMergeJoinExec: Utilizing child partitioning when joining

2020-01-02 Thread Brett Marcott
the right join keys. I’d suggest taking a look at the join execs > and take a look at how they build the result RDD from the partitions of the > left and right RDDs.(see doExecute(…)) left/right outer does look > surprising though. > > > > You should see something like… > &

SortMergeJoinExec: Utilizing child partitioning when joining

2019-12-31 Thread Brett Marcott
Hi all, I found this jira for an issue I ran into recently: https://issues.apache.org/jira/browse/SPARK-28771 My initial idea for a fix is to change SortMergeJoinExec's (and ShuffledHashJoinExec) requiredChildDistribution. At least if all below conditions are met, we could only require a subset

Re: SortMergeJoinExec: Utilizing child partitioning when joining

2020-01-07 Thread Brett Marcott
1. Where can I find information on how to run standard performance tests/benchmarks? 2. Are performance degradations to existing queries that are fixable by new equivalent queries not allowed for a new major spark version? On Thu, Jan 2, 2020 at 3:05 PM Brett Marcott wrote: > Tha