gengliangwang opened a new pull request, #56252:
URL: https://github.com/apache/spark/pull/56252
### What changes were proposed in this pull request?
`UnionExec` whole-stage codegen fusion (SPARK-56482) kept per-emission
codegen state in mutable instance fields on the plan node:
`currentEmittingChild` (set in `doProduce`, read in `doConsume` to pick a
child's projection) and `numOutputRowsTerm` (the once-per-stage metric term).
This PR moves both fields to `ThreadLocal`, isolating the state to the single
thread that runs a given `doCodeGen` pass.
### Why are the changes needed?
A single `UnionExec` instance can have its whole-stage codegen driven by
more than one thread at the same time: a reused exchange/subquery stage is
generated concurrently with the main plan, and async subquery /
dynamic-partition-pruning execution can overlap a driver-side `doCodeGen`. With
the shared mutable field, a racing `doProduce` resets `currentEmittingChild` to
`-1` while another thread is still inside `doConsume`, tripping:
```
java.lang.IllegalArgumentException: requirement failed:
UnionExec.doConsume invoked outside doProduce emission window
```
This surfaced as a flaky `LogicalPlanTagInSparkPlanSuite.q2` failure (q2
contains a `UNION`, and union fusion is enabled by default). Each `doCodeGen`
pass is itself single-threaded (`produce` -> `doConsume` run inline on one
thread), so a `ThreadLocal` isolates the state per pass without the
cross-thread race, while preserving the existing per-stage semantics (the
metric term is still computed once per pass).
### Does this PR introduce _any_ user-facing change?
No. It removes an intermittent internal code-generation failure; the
generated code and query results are unchanged.
### How was this patch tested?
Added a `UnionCodegenSuite` test, "SPARK-57196: concurrent codegen of a
shared UnionExec stage is thread-safe", that drives `doCodeGen()` on one shared
fused `UnionExec` stage from 8 threads. It reproduces the "outside doProduce
emission window" failure on the unpatched code and passes with this fix. Also
verified the full `UnionCodegenSuite` (43 tests), its ANSI/AQE variants, and
`LogicalPlanTagInSparkPlanSuite` q2 all pass.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.8)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]