uros-b commented on code in PR #56662:
URL: https://github.com/apache/spark/pull/56662#discussion_r3479693468
##########
sql/pipelines/src/main/scala/org/apache/spark/sql/pipelines/graph/FlowAnalysis.scala:
##########
@@ -44,17 +45,23 @@ object FlowAnalysis {
confs: Map[String, String],
queryContext: QueryContext,
queryOrigin: QueryOrigin) => {
+ // Flows are resolved in parallel on a shared session, so applying
per-flow confs by mutating
+ // that session's conf would race across flows. Instead, give each flow
a private SQLConf
+ // (a clone of the session's conf plus this flow's overrides) and
install it for the analyzing
+ // thread via SQLConf.withExistingConf. Analysis still runs on the
shared session, so its
+ // catalog and the resolved DataFrames are unaffected; only the confs
the analyzer reads are
+ // isolated per flow.
+ val spark = SparkSession.active
val ctx = FlowAnalysisContext(
allInputs = allInputs,
availableInputs = availableInputs,
queryContext = queryContext,
- spark = SparkSession.active
+ spark = spark,
+ flowConf = spark.sessionState.conf.clone()
Review Comment:
Thanks, agreed on both parts!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]