Re: [PR] [SPARK-49653][SQL] Single join for correlated scalar subqueries [spark]

via GitHub Fri, 20 Sep 2024 10:46:19 -0700


agubichev commented on code in PR #48145:
URL: https://github.com/apache/spark/pull/48145#discussion_r1768976267



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala:
##########
@@ -135,8 +138,15 @@ case class BroadcastNestedLoopJoinExec(
    *
    *   LeftOuter with BuildRight
    *   RightOuter with BuildLeft
+   *   LeftSingle with BuildRight
+   *
+   * For the (LeftSingle, BuildRight) case we pass 'checkMatches' function that
+   * makes sure there is at most 1 matching build row per every probe tuple.
+   * For all other cases, 'checkMatches' is a no-op.
    */
-  private def outerJoin(relation: Broadcast[Array[InternalRow]]): 
RDD[InternalRow] = {
+  private def outerJoin(
+      relation: Broadcast[Array[InternalRow]],
+      checkMatches: Int => Int): RDD[InternalRow] = {

Review Comment:
   it also increments the number of matches and returns the updated count.
   Otherwise we would need to increment it in the body of the join and do it 
for LeftSingle AND LeftOuter joins. This way we avoid incrementing the counter 
for LeftOuter join where it is not needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49653][SQL] Single join for correlated scalar subqueries [spark]

Reply via email to