gengliangwang opened a new pull request, #56073:
URL: https://github.com/apache/spark/pull/56073

   ### What changes were proposed in this pull request?
   
   This is a sub-task of 
[SPARK-56908](https://issues.apache.org/jira/browse/SPARK-56908).
   
   `SortMergeJoinExec.codegenFullOuter` and its interpreted counterpart 
`SortMergeFullOuterJoinScanner.findMatchingRows` both duplicate the "reuse the 
`BitSet` if its capacity is enough, otherwise allocate a new one" idiom for 
tracking matched left/right rows.
   
   Extract it into a new static helper class at 
`sql/core/src/main/java/org/apache/spark/sql/execution/joins/JoinHelper.java`:
   
   ```java
   public static BitSet resetMatched(BitSet matched, int bufferSize) { ... }
   ```
   
   and call it from the four sites (left + right in each method).
   
   ### Why are the changes needed?
   
   - Replaces ~16 inline lines with 4 helper calls, shrinking generated Java 
for the codegen sites.
   - Keeps the codegen and interpreted full-outer paths in lockstep so a future 
change to the reset rule lands in one place.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing `OuterJoinSuite` covers full-outer SMJ through both code paths 
(codegen on and off).
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to