c21 commented on pull request #32210: URL: https://github.com/apache/spark/pull/32210#issuecomment-856186082
> Option 1: pass an additional condition to check the tag field is "matched" v.s. "not-matched-yet"; @sigmod - can you elaborate more how it can work? Are you suggesting to add something for matched-or-not info inside `SortMergeJoinExec.condition`? > Option 2: only support this kind of fallback for inner joins, initially. Yeah I am thinking about it as well, but we have to fix it anyway for outer joins. I am more towards to get consensus from everyone for all join types. Another option I am thinking of is, to maintain two sorters for stream/probe side when doing outer join. One sorter is for matched rows from hash join (`sorterForMatched`) and another sorter is for non-matched rows from hash join (`sorterForNonmatched`). This still needs some change inside SMJ to merge these two sorter together but it avoids adding matched info to sorter. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
