[GitHub] spark pull request #22168: [SPARK-24985][SQL][WIP] Fix OOM in Full Outer Joi...

sujithjay Tue, 21 Aug 2018 10:38:56 -0700

Github user sujithjay commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22168#discussion_r211695230
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
    @@ -1058,31 +1064,37 @@ private class SortMergeFullOuterJoinScanner(
        * @return true if a valid match is found, false otherwise.
        */
       private def scanNextInBuffered(): Boolean = {
    -    while (leftIndex < leftMatches.size) {
    -      while (rightIndex < rightMatches.size) {
    -        joinedRow(leftMatches(leftIndex), rightMatches(rightIndex))
    -        if (boundCondition(joinedRow)) {
    -          leftMatched.set(leftIndex)
    -          rightMatched.set(rightIndex)
    +    val leftMatchesIterator = leftMatches.generateIterator(leftIndex)
    +
    +    while (leftMatchesIterator.hasNext) {
    +      val leftCurRow = leftMatchesIterator.next()
    +      val rightMatchesIterator = rightMatches.generateIterator(rightIndex)
    --- End diff --
    
    Hi @viirya, 
    Thank you for reviewing the code. 
    
    I agree the code will be a bit complicated if we had to keep scanning the 
iterators. In this particular case, I was following a pattern observed 
throughout the rest of the class.
    For eg., 
https://github.com/apache/spark/blob/35f7f5ce83984d8afe0b7955942baa04f2bef74f/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala#L307-L329
    
    Having said that, I do feel the penalty for following a similar approach in 
case of full outer joins could be higher. I will try & see what I can do.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22168: [SPARK-24985][SQL][WIP] Fix OOM in Full Outer Joi...

Reply via email to