uros-b commented on code in PR #56662:
URL: https://github.com/apache/spark/pull/56662#discussion_r3472419622
##########
sql/pipelines/src/main/scala/org/apache/spark/sql/pipelines/graph/FlowAnalysis.scala:
##########
@@ -44,17 +45,23 @@ object FlowAnalysis {
confs: Map[String, String],
queryContext: QueryContext,
queryOrigin: QueryOrigin) => {
+ // Flows are resolved in parallel on a shared session, so applying
per-flow confs by mutating
+ // that session's conf would race across flows. Instead, give each flow
a private SQLConf
+ // (a clone of the session's conf plus this flow's overrides) and
install it for the analyzing
+ // thread via SQLConf.withExistingConf. Analysis still runs on the
shared session, so its
+ // catalog and the resolved DataFrames are unaffected; only the confs
the analyzer reads are
+ // isolated per flow.
+ val spark = SparkSession.active
val ctx = FlowAnalysisContext(
allInputs = allInputs,
availableInputs = availableInputs,
queryContext = queryContext,
- spark = SparkSession.active
+ spark = spark,
+ flowConf = spark.sessionState.conf.clone()
Review Comment:
Let's discuss about residual sessionState.conf reads during analysis. So,
per-flow confs are no longer visible through spark.conf /
spark.sessionState.conf; only through SQLConf.get. That is fine for Catalyst
rules, but a few pipelines code paths still read the session conf directly
during analysis, e.g.:
- GraphIdentifierManager.isPathIdentifier → spark.sessionState.conf
- AutoCdcMergeFlow schema logic →
spark.sessionState.conf.caseSensitiveAnalysis
Previously (sequentially, without races), mutating the shared session conf
made those paths see per-flow values. Now they always see the run session
baseline. The PR fixes the one known case in FlowAnalysis; but if per-flow
confs are meant to affect those other paths, they would need the same treatment?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]