Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22168#discussion_r211619140
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
    @@ -1058,31 +1064,37 @@ private class SortMergeFullOuterJoinScanner(
        * @return true if a valid match is found, false otherwise.
        */
       private def scanNextInBuffered(): Boolean = {
    -    while (leftIndex < leftMatches.size) {
    -      while (rightIndex < rightMatches.size) {
    -        joinedRow(leftMatches(leftIndex), rightMatches(rightIndex))
    -        if (boundCondition(joinedRow)) {
    -          leftMatched.set(leftIndex)
    -          rightMatched.set(rightIndex)
    +    val leftMatchesIterator = leftMatches.generateIterator(leftIndex)
    +
    +    while (leftMatchesIterator.hasNext) {
    +      val leftCurRow = leftMatchesIterator.next()
    +      val rightMatchesIterator = rightMatches.generateIterator(rightIndex)
    --- End diff --
    
    Can we keep the scanning left and right iterators? Because if they are 
spilled, obtaining the iterator from spilled data needs to loop over spill 
writers and create readers. We may avoid calling `generateIterator` every time 
for obtaining the iterators. However it might make the code a bit complicated 
than now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to