cloud-fan commented on code in PR #56252:
URL: https://github.com/apache/spark/pull/56252#discussion_r3338536966


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala:
##########
@@ -988,16 +988,25 @@ case class UnionExec(children: Seq[SparkPlan]) extends 
SparkPlan with CodegenSup
 
   override def inputRDDs(): Seq[RDD[InternalRow]] = Seq(unionedInputRDD)
 
-  // Set in `doProduce`, read in `doConsume` during single-threaded code
-  // emission. `numOutputRowsTerm` is registered once per stage so the
-  // metric appears in `references[]` exactly once instead of once per
-  // child. `currentEmittingChild` tells `doConsume` which child's
-  // projection to bind.
-  @transient private var numOutputRowsTerm: String = _
-  @transient private var currentEmittingChild: Int = -1
+  // Per-emission codegen state, set in `doProduce` and read in `doConsume`.
+  // `numOutputRowsTerm` is registered once per stage so the metric appears in
+  // `references[]` exactly once instead of once per child; 
`currentEmittingChild`
+  // tells `doConsume` which child's projection to bind.
+  //
+  // A single `UnionExec` instance can have its codegen driven by more than one
+  // thread at the same time: a reused exchange/subquery stage is generated
+  // concurrently with the main plan, and async subquery / dynamic-pruning
+  // execution can overlap a driver-side `doCodeGen`. A plain field would let a
+  // racing `doProduce` reset `currentEmittingChild` to -1 while another thread
+  // is still in `doConsume`. Each `doCodeGen` pass is itself single-threaded
+  // (`produce` -> `doConsume` run inline on one thread), so a `ThreadLocal`
+  // isolates the state per pass without that cross-thread race.
+  @transient private lazy val numOutputRowsTerm = new ThreadLocal[String]

Review Comment:
   These two fields are *pass-scoped* codegen state but stored as *per-thread* 
`ThreadLocal`. The framework's established home for pass-scoped codegen state 
is `CodegenContext` (`INPUT_ROW`, `currentVars`, `currentPartitionIndexVar`, 
`freshNamePrefix`), created fresh per `doCodeGen` pass — and this very method 
already routes a sibling piece of pass state, `currentPartitionIndexVar`, 
through `ctx` (saved/restored just below). So two pieces of the same kind of 
state now use two different mechanisms here. `ctx`-scoped storage would be 
exactly per-pass (ThreadLocal's per-thread isolation is only correct because 
passes don't nest on a thread) and consistent with the existing pattern. The 
tradeoff — and why this is a question, not a blocker — is that adding 
union-specific fields to the shared `CodegenContext` pollutes a class every 
operator uses, whereas `ThreadLocal` stays localized here. Could you add a 
sentence (comment or PR description) on why `ThreadLocal` was preferred over th
 reading these through `ctx`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to