srielau commented on code in PR #55977:
URL: https://github.com/apache/spark/pull/55977#discussion_r3272828330
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala:
##########
@@ -132,15 +132,33 @@ class CatalogManager(
}
}
- def setCurrentNamespace(namespace: Array[String]): Unit = synchronized {
- if (isSessionCatalog(currentCatalog) && namespace.length == 1) {
+ def setCurrentNamespace(namespace: Array[String]): Unit = {
+ // SPARK-56939: do NOT hold [[CatalogManager]]'s intrinsic lock across the
callbacks below.
+ // [[v1SessionCatalog.setCurrentDatabaseWithNameCheck]] briefly
synchronizes on
+ // [[SessionCatalog]], and concurrent unqualified function resolution
acquires the
+ // [[SessionCatalog]] lock and then reaches into [[CatalogManager]] via
+ // [[sqlResolutionPathEntries]]; nesting the manager lock outside the
catalog lock here
+ // would invert that order and deadlock. Snapshot the dispatch decision
under the lock,
+ // run callbacks outside it, then publish the new namespace under the lock
again.
+ //
+ // Concurrency trade-off versus the pre-SPARK-56939 atomic version: the
`isSession`
+ // snapshot can drift if a concurrent [[setCurrentCatalog]] switches to a
v2 catalog
+ // between this read and the v1 callback below -- the callback would still
touch
+ // `v1.currentDb` even though the active catalog is no longer the session
catalog. A
+ // later switch back to the session catalog resets `v1.currentDb` to
`default` (see
+ // [[setCurrentCatalog]]), so long-term state remains consistent; only the
intermediate
+ // observation is novel. Acceptable trade-off against the deadlock
alternative.
Review Comment:
Good catch -- the prior comment scoped the trade-off to v1-side staleness
only. Restructured the block in commit f504667f into two labeled cases: (a) the
v1-side `isSession` drift it already documented, and (b) the sticky CM-side
publish-overwrite race you describe (concurrent `setCurrentCatalog` completing
between the v1 callback returning and the publish, leaving `_currentNamespace =
Some(namespace)` published under a different `_currentCatalogName` than the
snapshotted one, with no auto-recovery). Framed (b) explicitly as
last-writer-wins for racing `USE` commands -- the conventional expectation --
so it reads as an accepted trade-off rather than a hidden anomaly.
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala:
##########
@@ -221,6 +239,15 @@ class CatalogManager(
* When PATH is enabled and a session path is in effect (stored or via
* [[SQLConf#DEFAULT_PATH]]), formats the resolved entries. Otherwise falls
back to the legacy
* resolutionSearchPath.
+ *
+ * SPARK-56939 note: this is currently the only intentional
`CatalogManager.synchronized ->
+ * SessionCatalog.synchronized` nest left in this class. The transitive call
into
+ * [[v1SessionCatalog.getCurrentDatabase]] happens via [[currentNamespace]],
which fetches
+ * the v1 current database under the CM lock. It is safe today because no
code path holds
+ * [[SessionCatalog]]'s intrinsic lock while waiting on [[CatalogManager]]'s
-- the
+ * SPARK-56939 fix removed every such SC->CM ordering. Any future change
that introduces a
+ * new SC->CM ordering must avoid resurrecting the deadlock by also taking
+ * `currentPathString` (or any other CM->SC nest) into account.
Review Comment:
Applied your suggested rewording (f504667f) -- inverted the clauses so the
modal verb attaches unambiguously.
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala:
##########
@@ -265,6 +292,48 @@ class CatalogManager(
currentCatalog, currentNamespace,
currentCatalog, currentNamespace)
+ /**
+ * Snapshot the live PATH-derived [[SessionCatalog.SessionFunctionKind]]
order used by
+ * unqualified function/table-function resolution.
+ *
+ * The `(currentCatalog, _currentNamespace, sessionPath)` triple is read
together inside a
+ * single CM critical section so a concurrent `USE` / `SET PATH` cannot
return a torn
+ * snapshot for those three fields (e.g. catalog from one observation,
explicit namespace
+ * from another).
+ *
+ * The `v1SessionCatalog.getCurrentDatabase` read needed for the
default-namespace fallback
+ * is taken OUTSIDE the CM lock and is therefore intentionally racy w.r.t. a
concurrent
+ * `USE SCHEMA`. That staleness is harmless for this helper's output: this
method consumes
+ * `effectiveNs` only to expand `CURRENT_SCHEMA` markers in the SQL path, and
+ * [[CatalogManager.systemFunctionKindsFromPath]] only retains literal
`system.builtin` /
+ * `system.session` entries from the resolved path -- it never inspects any
+ * `(catalog, namespace)` derived from `v1`. So if `v1CurrentDb` lags by one
`USE SCHEMA`,
+ * a `CURRENT_SCHEMA` entry might briefly resolve to the previous database,
but the kinds
+ * list (the only thing returned here) is unaffected. Lifting the read
inside the CM lock
+ * would re-introduce the SPARK-56939 lock-order inversion this helper
exists to avoid.
Review Comment:
Applied (f504667f). "Lifting" -> "Moving".
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]