Re: [PR] [SPARK-56681][SQL] PATH cleanup [spark]

via GitHub Sat, 02 May 2026 10:02:50 -0700


cloud-fan commented on code in PR #55647:
URL: https://github.com/apache/spark/pull/55647#discussion_r3176941948



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/VariableResolution.scala:
##########
@@ -43,10 +43,25 @@ class VariableResolution(
    * (PATH enabled and explicitly set).
    */
   private def allowUnqualifiedSessionTempVariableLookup(nameParts: 
Seq[String]): Boolean = {
-    if (nameParts.length != 1) return true
-    catalogManager.sessionScopeUnqualifiedAllowed(
-      catalogManager.currentCatalog.name(),
-      catalogManager.currentNamespace.toSeq)
+    nameParts.length != 1 || catalogManager.isSystemSessionOnPath
+  }
+
+  /**
+   * Search-path entries to report in `UNRESOLVED_VARIABLE` for DML lookups 
(`SET VAR`,
+   * `FETCH ... INTO`). The full SQL path is reported regardless of how the 
name was
+   * qualified, matching the convention used by `TABLE_OR_VIEW_NOT_FOUND` and
+   * `UNRESOLVED_ROUTINE`. Keeping the rendering qualification-independent 
also avoids
+   * re-shaping the error if Spark ever grows struct-field assignment, where 
2-part forms
+   * become genuinely ambiguous.
+   *
+   * DDL paths (`DECLARE` / `DROP` name validation in
+   * [[org.apache.spark.sql.catalyst.analysis.ResolveCatalogs]]) do not 
consult the SQL path
+   * and report `[system.session]` directly at their throw site.
+   */
+  def searchPathEntriesForError: Seq[Seq[String]] = {

Review Comment:
   Question on the convention: `searchPathEntriesForError` returns the full SQL 
path, so `UNRESOLVED_VARIABLE` for an unresolved name like `LOCALVAR` (used in 
a SQL script — see the `SqlScriptingExecutionSuite` golden change to `` 
[`system`.`builtin`, `system`.`session`, `spark_catalog`.`default`] ``) now 
lists `system.builtin` and `spark_catalog.default` even though variables can 
only ever live in `system.session`. The Scaladoc justifies it as matching 
`TABLE_OR_VIEW_NOT_FOUND` / `UNRESOLVED_ROUTINE`, which is internally 
consistent — but for variables those other entries are noise. Is the 
user-facing message acceptable here, or would filtering down to entries that 
could actually contain variables (i.e., `system.session`) be clearer? Happy 
either way, just want to confirm the UX choice was deliberate.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveFetchCursor.scala:
##########
@@ -64,7 +64,8 @@ class ResolveFetchCursor(val catalogManager: CatalogManager) 
extends Rule[Logica
           nameParts = u.nameParts
         ) match {
           case Some(variable) => variable.copy(canFold = false)
-          case _ => throw unresolvedVariableError(u.nameParts, Seq("SYSTEM", 
"SESSION"))
+          case _ => throw unresolvedVariableError(
+            u.nameParts, Seq(Seq("SYSTEM", "SESSION")), u.origin)

Review Comment:
   The PR description's Bonus item promises "DML lookup failures (`SET VAR`, 
`FETCH ... INTO`) report the full SQL path as a bracketed list." 
`ResolveSetVariable` does this via 
`variableResolution.searchPathEntriesForError` (ResolveSetVariable.scala:68), 
but here the error still hardcodes `Seq(Seq("SYSTEM", "SESSION"))`. The file 
already instantiates `variableResolution` (line 37-38), so this is a one-line 
fix. As-is, a `FETCH ... INTO` miss reports `` [`SYSTEM`.`SESSION`] `` 
regardless of the actual PATH — exactly the asymmetry the bonus item set out to 
remove.
   
   ```suggestion
             case _ => throw unresolvedVariableError(
               u.nameParts, variableResolution.searchPathEntriesForError, 
u.origin)
   ```



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala:
##########
@@ -202,12 +202,59 @@ class CatalogManager(
       currentCatalog, currentNamespace,
       currentCatalog, currentNamespace)
 
-  /** True if [[sqlResolutionPathEntries]] includes `system.session`. */
-  def sessionScopeUnqualifiedAllowed(
-      currentCatalog: String,
-      currentNamespace: Seq[String]): Boolean =
-    sqlResolutionPathEntries(currentCatalog, currentNamespace)
-      .exists(CatalogManager.isSystemSessionPathEntry)
+  /**
+   * True if `system.session` is on the SQL path. Only literal path entries 
can match; the
+   * [[CurrentSchemaEntry]] marker can never be `system.session`. Inspecting 
stored entries

Review Comment:
   The Scaladoc claim "the [[CurrentSchemaEntry]] marker can never be 
`system.session`" is true today only because `system` cannot be a current 
catalog (`FakeSystemCatalog` isn't user-selectable, and `USE CATALOG system` 
would fail in `currentCatalog`/`catalog(...)`). Worth making that load-bearing 
assumption explicit so a future change that registers a real catalog under 
`system` doesn't silently make this check wrong. Something like "…because 
`system` is not a registered catalog (only the internal `FakeSystemCatalog`), 
so `CurrentSchemaEntry`'s expansion `currentCatalog.name() +: currentNamespace` 
cannot start with `system`" would be enough.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionResolution.scala:
##########
@@ -370,7 +355,14 @@ class FunctionResolution(
     if (nameParts.length == 1) {
       // Must match [[resolutionCandidates]] / [[resolveFunction]]: 
single-part names use PATH +
       // session order, not only the current namespace (LookupCatalog 
single-part rule).
-      for (candidate <- resolutionCandidates(nameParts)) {
+      // `system.*` candidates are handled by [[lookupBuiltinOrTempFunction]] /
+      // [[lookupBuiltinOrTempTableFunction]] above; skip them here to avoid 
redundant
+      // catalog calls.
+      val persistentCandidates = resolutionCandidates(nameParts).filterNot { c 
=>
+        c.length >= 2 &&
+          c.head.equalsIgnoreCase(CatalogManager.SYSTEM_CATALOG_NAME)
+      }

Review Comment:
   The filter `c.head.equalsIgnoreCase(SYSTEM_CATALOG_NAME)` strips ANY 
candidate beginning with `system`, but the comment only enumerates 
`system.session` / `system.builtin` / `system.ai` as already-resolved. Today 
this is fine — `system.ai` doesn't even exist in this codebase, and only 
`lookupBuiltinOrTempFunction` covers `system.session`/`system.builtin`. But if 
a future PR registers a real catalog-resolved namespace under `system.<x>` 
(anything not handled by `identifierFromSystemNameParts`), this filter will 
silently skip persistent lookups for it. Consider tightening to the 
actually-handled set, e.g. checking the second segment against 
`SESSION_NAMESPACE` / `BUILTIN_NAMESPACE`, so the optimization tracks the 
redundancy reason rather than the prefix. Not blocking — a comment caveat would 
also work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-56681][SQL] PATH cleanup [spark]

Reply via email to