Re: SortMergeJoinExec: Utilizing child partitioning when joining

2020-01-07 Thread Long, Andrew
, January 7, 2020 at 12:00 AM To: "Long, Andrew" Cc: "dev@spark.apache.org" Subject: Re: SortMergeJoinExec: Utilizing child partitioning when joining 1. Where can I find information on how to run standard performance tests/benchmarks? 2. Are performance degradations to existing quer

Re: SortMergeJoinExec: Utilizing child partitioning when joining

2020-01-07 Thread Brett Marcott
the join execs >> and take a look at how they build the result RDD from the partitions of the >> left and right RDDs.(see doExecute(…)) left/right outer does look >> surprising though. >> >> >> >> You should see something like… >> >> >> >&g

Re: SortMergeJoinExec: Utilizing child partitioning when joining

2020-01-02 Thread Brett Marcott
gt; > > left.execute().zipPartitions(right.execute()) { (leftIter, rightIter) => > > > > > > Cheers Andrew > > > > *From: *Brett Marcott > *Date: *Tuesday, December 31, 2019 at 11:49 PM > *To: *"dev@spark.apache.org" > *Subject: *SortMergeJoinEx

Re: SortMergeJoinExec: Utilizing child partitioning when joining

2020-01-02 Thread Long, Andrew
MergeJoinExec: Utilizing child partitioning when joining Hi all, I found this jira for an issue I ran into recently: https://issues.apache.org/jira/browse/SPARK-28771 My initial idea for a fix is to change SortMergeJoinExec's (and ShuffledHashJoinExec) requiredChildDistribution. At least if all bel

SortMergeJoinExec: Utilizing child partitioning when joining

2019-12-31 Thread Brett Marcott
Hi all, I found this jira for an issue I ran into recently: https://issues.apache.org/jira/browse/SPARK-28771 My initial idea for a fix is to change SortMergeJoinExec's (and ShuffledHashJoinExec) requiredChildDistribution. At least if all below conditions are met, we could only require a subset