cloud-fan commented on code in PR #56252: URL: https://github.com/apache/spark/pull/56252#discussion_r3338536966
########## sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala: ########## @@ -988,16 +988,25 @@ case class UnionExec(children: Seq[SparkPlan]) extends SparkPlan with CodegenSup override def inputRDDs(): Seq[RDD[InternalRow]] = Seq(unionedInputRDD) - // Set in `doProduce`, read in `doConsume` during single-threaded code - // emission. `numOutputRowsTerm` is registered once per stage so the - // metric appears in `references[]` exactly once instead of once per - // child. `currentEmittingChild` tells `doConsume` which child's - // projection to bind. - @transient private var numOutputRowsTerm: String = _ - @transient private var currentEmittingChild: Int = -1 + // Per-emission codegen state, set in `doProduce` and read in `doConsume`. + // `numOutputRowsTerm` is registered once per stage so the metric appears in + // `references[]` exactly once instead of once per child; `currentEmittingChild` + // tells `doConsume` which child's projection to bind. + // + // A single `UnionExec` instance can have its codegen driven by more than one + // thread at the same time: a reused exchange/subquery stage is generated + // concurrently with the main plan, and async subquery / dynamic-pruning + // execution can overlap a driver-side `doCodeGen`. A plain field would let a + // racing `doProduce` reset `currentEmittingChild` to -1 while another thread + // is still in `doConsume`. Each `doCodeGen` pass is itself single-threaded + // (`produce` -> `doConsume` run inline on one thread), so a `ThreadLocal` + // isolates the state per pass without that cross-thread race. + @transient private lazy val numOutputRowsTerm = new ThreadLocal[String] Review Comment: These two fields are *pass-scoped* codegen state but stored as *per-thread* `ThreadLocal`. The framework's established home for pass-scoped codegen state is `CodegenContext` (`INPUT_ROW`, `currentVars`, `currentPartitionIndexVar`, `freshNamePrefix`), created fresh per `doCodeGen` pass — and this very method already routes a sibling piece of pass state, `currentPartitionIndexVar`, through `ctx` (saved/restored just below). So two pieces of the same kind of state now use two different mechanisms here. `ctx`-scoped storage would be exactly per-pass (ThreadLocal's per-thread isolation is only correct because passes don't nest on a thread) and consistent with the existing pattern. The tradeoff — and why this is a question, not a blocker — is that adding union-specific fields to the shared `CodegenContext` pollutes a class every operator uses, whereas `ThreadLocal` stays localized here. Could you add a sentence (comment or PR description) on why `ThreadLocal` was preferred over th reading these through `ctx`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
