LuciferYang opened a new pull request, #56662:
URL: https://github.com/apache/spark/pull/56662
### What changes were proposed in this pull request?
Declarative Pipelines resolves flows in parallel on a shared `SparkSession`
(`DataflowGraphTransformer`, parallelism 10).
`FlowAnalysis.createFlowFunctionFromLogicalPlan` applied each flow's per-flow
SQL confs by mutating that shared session's conf and restoring it afterwards
(`FlowAnalysisContext.setConf` / `restoreOriginalConf`). Because the session is
shared, concurrent flows interleave those set/capture/restore operations: a
flow can capture another flow's in-flight value as the "original", read another
flow's conf mid-analysis, or restore the wrong value - corrupting the run
session's conf and making analysis non-deterministic.
The fix resolves each flow against its own cloned session.
`createFlowFunctionFromLogicalPlan` clones the active session, applies the
per-flow confs to the clone, and analyzes within `clone.withActive { ... }`.
The clone is per-flow and discarded after analysis, so the conf-restore
bookkeeping in `FlowAnalysisContext` is no longer needed and is removed.
### Why are the changes needed?
When more than one flow defines per-flow confs, parallel resolution can
analyze a flow under another flow's confs and leave the session the pipeline is
run from in a wrong state, producing non-deterministic and occasionally
incorrect analysis and schema inference.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Added a test in `ConnectValidPipelineSuite` that observes, during a flow's
analysis, that the per-flow conf is applied to the analysis session but does
not leak onto the session the pipeline is run from. It fails without the fix
(the conf leaks) and passes with it. The rest of the suite, including "Pipeline
level default spark confs are applied with correct precedence", still passes.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]