c21 removed a comment on pull request #29097:
URL: https://github.com/apache/spark/pull/29097#issuecomment-678710762


   I have similar concern with @gatorsmile . I think this also depends on the 
run-time cardinality of data.
   
   E.g., if left side is smaller than right side, but every row from left side 
is same, and every row from right side is not same (unique). We should buffer 
right side here even though ride side is larger, because if we buffer left 
side, we essentially need to read all left side into the buffer.
   
   In addition, this PR is swapping left and right side based on total size. 
However, during run-time, each task/partition can have different amount of data 
per left + right side. I think simply swapping left and right side here might 
cause some tasks to regress but some tasks to improve.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to