Github user sujithjay commented on a diff in the pull request:
https://github.com/apache/spark/pull/22168#discussion_r211695230
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
---
@@ -1058,31 +1064,37 @@ private class SortMergeFullOuterJoinScanner(
* @return true if a valid match is found, false otherwise.
*/
private def scanNextInBuffered(): Boolean = {
- while (leftIndex < leftMatches.size) {
- while (rightIndex < rightMatches.size) {
- joinedRow(leftMatches(leftIndex), rightMatches(rightIndex))
- if (boundCondition(joinedRow)) {
- leftMatched.set(leftIndex)
- rightMatched.set(rightIndex)
+ val leftMatchesIterator = leftMatches.generateIterator(leftIndex)
+
+ while (leftMatchesIterator.hasNext) {
+ val leftCurRow = leftMatchesIterator.next()
+ val rightMatchesIterator = rightMatches.generateIterator(rightIndex)
--- End diff --
Hi @viirya,
Thank you for reviewing the code.
I agree the code will be a bit complicated if we had to keep scanning the
iterators. In this particular case, I was following a pattern observed
throughout the rest of the class.
For eg.,
https://github.com/apache/spark/blob/35f7f5ce83984d8afe0b7955942baa04f2bef74f/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala#L307-L329
Having said that, I do feel the penalty for following a similar approach in
case of full outer joins could be higher. I will try & see what I can do.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]