Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/22168#discussion_r211619140
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
---
@@ -1058,31 +1064,37 @@ private class SortMergeFullOuterJoinScanner(
* @return true if a valid match is found, false otherwise.
*/
private def scanNextInBuffered(): Boolean = {
- while (leftIndex < leftMatches.size) {
- while (rightIndex < rightMatches.size) {
- joinedRow(leftMatches(leftIndex), rightMatches(rightIndex))
- if (boundCondition(joinedRow)) {
- leftMatched.set(leftIndex)
- rightMatched.set(rightIndex)
+ val leftMatchesIterator = leftMatches.generateIterator(leftIndex)
+
+ while (leftMatchesIterator.hasNext) {
+ val leftCurRow = leftMatchesIterator.next()
+ val rightMatchesIterator = rightMatches.generateIterator(rightIndex)
--- End diff --
Can we keep the scanning left and right iterators? Because if they are
spilled, obtaining the iterator from spilled data needs to loop over spill
writers and create readers. We may avoid calling `generateIterator` every time
for obtaining the iterators. However it might make the code a bit complicated
than now.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]