gengliangwang opened a new pull request, #56258:
URL: https://github.com/apache/spark/pull/56258

   ### What changes were proposed in this pull request?
   
   `In`'s whole-stage codegen emits, for each list element `x`:
   
   ```java
   if (x.isNull) {
     inTmpResult = -1; // HAS_NULL
   } else if (value == x) {
     inTmpResult = 1;  // MATCHED
     continue;
   }
   ```
   
   IN lists are usually constant literals, so `x.isNull` is the literal `false` 
and the `HAS_NULL` branch is dead (`if (false) { inTmpResult = -1; } else if 
...`). This PR emits only the equality check when `x.isNull == FalseLiteral`. 
Symmetrically, when an element is statically null (`x.isNull == TrueLiteral`, 
e.g. `IN (NULL, ...)`), only the `HAS_NULL` assignment is emitted and the dead 
equality check is dropped.
   
   ### Why are the changes needed?
   
   Sub-task of SPARK-56908 (reduce generated Java size in whole-stage codegen). 
Dumping the TPC-DS whole-stage codegen shows ~348 dead `if (false) { 
inTmpResult = -1; } else if (...)` blocks from `In` over literal lists. 
Emitting only the live branch removes the dead conditional from the generated 
code.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. The generated code is smaller but evaluates IN with identical (including 
null) semantics: a statically non-null element can never set `HAS_NULL`, and a 
statically null element can never match, so dropping those dead branches is 
equivalent.
   
   ### How was this patch tested?
   
   Behavior-preserving change covered by `PredicateSuite` (70 tests, including 
`In` with null list elements and a null left-hand value), all pass. 
Additionally verified by re-dumping the TPC-DS whole-stage codegen: the ~348 
dead `if (false)` blocks in `In` are gone and every generated subtree still 
compiles.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Opus 4.8)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to