, January 7, 2020 at 12:00 AM
To: "Long, Andrew"
Cc: "dev@spark.apache.org"
Subject: Re: SortMergeJoinExec: Utilizing child partitioning when joining
1. Where can I find information on how to run standard performance
tests/benchmarks?
2. Are performance degradations to existing quer
the join execs
>> and take a look at how they build the result RDD from the partitions of the
>> left and right RDDs.(see doExecute(…)) left/right outer does look
>> surprising though.
>>
>>
>>
>> You should see something like…
>>
>>
>>
>&g
gt;
>
> left.execute().zipPartitions(right.execute()) { (leftIter, rightIter) =>
>
>
>
>
>
> Cheers Andrew
>
>
>
> *From: *Brett Marcott
> *Date: *Tuesday, December 31, 2019 at 11:49 PM
> *To: *"dev@spark.apache.org"
> *Subject: *SortMergeJoinEx
MergeJoinExec: Utilizing child partitioning when joining
Hi all,
I found this jira for an issue I ran into recently:
https://issues.apache.org/jira/browse/SPARK-28771
My initial idea for a fix is to change SortMergeJoinExec's (and
ShuffledHashJoinExec) requiredChildDistribution.
At least if all bel
Hi all,
I found this jira for an issue I ran into recently:
https://issues.apache.org/jira/browse/SPARK-28771
My initial idea for a fix is to change SortMergeJoinExec's (and
ShuffledHashJoinExec) requiredChildDistribution.
At least if all below conditions are met, we could only require a subset