cloud-fan opened a new pull request, #56410: URL: https://github.com/apache/spark/pull/56410
### What changes were proposed in this pull request? This is a follow-up of #56275, addressing https://github.com/apache/spark/pull/56275#discussion_r3376088031. The original PR routes physical preparation and AQE rules through `RuleExecutor` so their per-rule timing lands in `QueryPlanningTracker`. To avoid concurrent writes to a single tracker from the threads that plan scalar / IN / DPP subqueries, it gave each `AdaptiveSparkPlanExec` its own `QueryPlanningTracker`, registered them in `AdaptiveExecutionContext.planningTrackers`, and folded them all into the query's shared tracker (`context.qe.tracker`) in `finalPlanUpdate` via a new `QueryPlanningTracker.merge`. This PR replaces that machinery with a lock on the single shared tracker. The only mutation point is `QueryPlanningTracker.recordRuleInvocation`; making it (and the `rules` / `topRulesByTime` read accessors) `synchronized` lets every AQE node -- main and subquery -- record straight into the single `context.qe.tracker`, which is already reachable from every node (sub-AQEs reuse the same `AdaptiveExecutionContext`). That lets the following pieces, which existed solely to avoid concurrent writes, be deleted: - the per-`AdaptiveSparkPlanExec` `tracker` field, - `AdaptiveExecutionContext.planningTrackers`, - `QueryPlanningTracker.merge`, - the merge loop in `finalPlanUpdate`, and - the `if (!isSubquery)` special-case there. `PhysicalRuleExecutor`, the `applyPhysicalRules` signature change, and the `withTracker` wrapping from the original PR are unchanged. ### Why are the changes needed? Beyond the reduced surface area, the lock makes correctness local: - As written, the merge is safe only because sub-AQEs complete (via `waitForSubqueries` -> `awaitResult`) before `finalPlanUpdate` reads their trackers, and because `finalPlanUpdate` is a `lazy val` that runs once. That is a non-obvious chain. A lock makes the guarantee local to the one mutating method and independent of when subqueries complete or when `finalPlanUpdate` runs, so a future change to subquery scheduling or plan finalization cannot silently reintroduce a race. - Any future code that records a rule from another thread is automatically covered by the lock; with per-node trackers each new path would have to wire up its own tracker plus merge. - The cost is negligible: recording is one `HashMap` put per rule invocation, so an uncontended monitor on the single-threaded analyzer / optimizer path is tens of ns against ms-scale planning, and it is contended only by the rare concurrent subquery-planning threads. - A shared tracker reflects AQE rules as they run, rather than only after `finalPlanUpdate` at execution time. ### Does this PR introduce _any_ user-facing change? No. This is an internal refactor of how the same per-rule timing is recorded; `tracker.rules`, `tracker.topRulesByTime(...)`, and `RuleExecutor.dumpTimeSpent()` cover the same rules as after #56275. ### How was this patch tested? Existing tests in `QueryPlanningTrackerEndToEndSuite` (added by #56275), including `SPARK-57212: Track sub-query AQE rules`, which exercises the cross-thread subquery-recording path, continue to pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.8 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
