cloud-fan commented on code in PR #55977:
URL: https://github.com/apache/spark/pull/55977#discussion_r3272766367
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala:
##########
@@ -221,6 +239,15 @@ class CatalogManager(
* When PATH is enabled and a session path is in effect (stored or via
* [[SQLConf#DEFAULT_PATH]]), formats the resolved entries. Otherwise falls
back to the legacy
* resolutionSearchPath.
+ *
+ * SPARK-56939 note: this is currently the only intentional
`CatalogManager.synchronized ->
+ * SessionCatalog.synchronized` nest left in this class. The transitive call
into
+ * [[v1SessionCatalog.getCurrentDatabase]] happens via [[currentNamespace]],
which fetches
+ * the v1 current database under the CM lock. It is safe today because no
code path holds
+ * [[SessionCatalog]]'s intrinsic lock while waiting on [[CatalogManager]]'s
-- the
+ * SPARK-56939 fix removed every such SC->CM ordering. Any future change
that introduces a
+ * new SC->CM ordering must avoid resurrecting the deadlock by also taking
+ * `currentPathString` (or any other CM->SC nest) into account.
Review Comment:
Minor wording: "must avoid resurrecting the deadlock by also taking ... into
account" attaches ambiguously — it can be parsed as "the resurrection is done
by taking it into account." Inverting the clauses removes the ambiguity.
```suggestion
* SPARK-56939 fix removed every such SC->CM ordering. Any future change
that introduces a
* new SC->CM ordering must take `currentPathString` (or any other CM->SC
nest) into
* account to avoid resurrecting the deadlock.
```
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala:
##########
@@ -132,15 +132,33 @@ class CatalogManager(
}
}
- def setCurrentNamespace(namespace: Array[String]): Unit = synchronized {
- if (isSessionCatalog(currentCatalog) && namespace.length == 1) {
+ def setCurrentNamespace(namespace: Array[String]): Unit = {
+ // SPARK-56939: do NOT hold [[CatalogManager]]'s intrinsic lock across the
callbacks below.
+ // [[v1SessionCatalog.setCurrentDatabaseWithNameCheck]] briefly
synchronizes on
+ // [[SessionCatalog]], and concurrent unqualified function resolution
acquires the
+ // [[SessionCatalog]] lock and then reaches into [[CatalogManager]] via
+ // [[sqlResolutionPathEntries]]; nesting the manager lock outside the
catalog lock here
+ // would invert that order and deadlock. Snapshot the dispatch decision
under the lock,
+ // run callbacks outside it, then publish the new namespace under the lock
again.
+ //
+ // Concurrency trade-off versus the pre-SPARK-56939 atomic version: the
`isSession`
+ // snapshot can drift if a concurrent [[setCurrentCatalog]] switches to a
v2 catalog
+ // between this read and the v1 callback below -- the callback would still
touch
+ // `v1.currentDb` even though the active catalog is no longer the session
catalog. A
+ // later switch back to the session catalog resets `v1.currentDb` to
`default` (see
+ // [[setCurrentCatalog]]), so long-term state remains consistent; only the
intermediate
+ // observation is novel. Acceptable trade-off against the deadlock
alternative.
Review Comment:
Nit (non-blocking): the expanded block correctly documents the
`isSession`-snapshot drift wrt `v1.currentDb`, but there's a second drift mode
in the same window that the comment doesn't mention. Between the v1 callback
returning (L155) and the publish at L159-161, a concurrent `setCurrentCatalog`
can complete fully — switching `_currentCatalogName` to a v2 catalog and
clearing `_currentNamespace = None` — before this method's L159 publish
overwrites that with `Some(namespace)`. End state: `_currentNamespace =
Some([userSchema])` is published under a *different* `_currentCatalogName` than
the one active when L151 made the dispatch decision, and unlike the
`v1.currentDb` drift this one is sticky — there's no analogous "switch back
resets it" recovery. It's still last-writer-wins for two racing USE commands
(acceptable), but the trade-off picture in the current comment is scoped to
v1-side staleness only. Worth one more sentence either acknowledging the
`_currentNamespace` vs `
_currentCatalogName` mismatch window, or stating explicitly that the
documented trade-off covers v1 state only.
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala:
##########
@@ -265,6 +292,48 @@ class CatalogManager(
currentCatalog, currentNamespace,
currentCatalog, currentNamespace)
+ /**
+ * Snapshot the live PATH-derived [[SessionCatalog.SessionFunctionKind]]
order used by
+ * unqualified function/table-function resolution.
+ *
+ * The `(currentCatalog, _currentNamespace, sessionPath)` triple is read
together inside a
+ * single CM critical section so a concurrent `USE` / `SET PATH` cannot
return a torn
+ * snapshot for those three fields (e.g. catalog from one observation,
explicit namespace
+ * from another).
+ *
+ * The `v1SessionCatalog.getCurrentDatabase` read needed for the
default-namespace fallback
+ * is taken OUTSIDE the CM lock and is therefore intentionally racy w.r.t. a
concurrent
+ * `USE SCHEMA`. That staleness is harmless for this helper's output: this
method consumes
+ * `effectiveNs` only to expand `CURRENT_SCHEMA` markers in the SQL path, and
+ * [[CatalogManager.systemFunctionKindsFromPath]] only retains literal
`system.builtin` /
+ * `system.session` entries from the resolved path -- it never inspects any
+ * `(catalog, namespace)` derived from `v1`. So if `v1CurrentDb` lags by one
`USE SCHEMA`,
+ * a `CURRENT_SCHEMA` entry might briefly resolve to the previous database,
but the kinds
+ * list (the only thing returned here) is unaffected. Lifting the read
inside the CM lock
+ * would re-introduce the SPARK-56939 lock-order inversion this helper
exists to avoid.
Review Comment:
Minor wording: "Lifting the read inside the CM lock" is contradictory —
"lifting" connotes moving up/out, while the intent here is moving the read
*into* the locked region.
```suggestion
* list (the only thing returned here) is unaffected. Moving the read
inside the CM lock
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]