cloud-fan commented on code in PR #55647:
URL: https://github.com/apache/spark/pull/55647#discussion_r3176941948
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/VariableResolution.scala:
##########
@@ -43,10 +43,25 @@ class VariableResolution(
* (PATH enabled and explicitly set).
*/
private def allowUnqualifiedSessionTempVariableLookup(nameParts:
Seq[String]): Boolean = {
- if (nameParts.length != 1) return true
- catalogManager.sessionScopeUnqualifiedAllowed(
- catalogManager.currentCatalog.name(),
- catalogManager.currentNamespace.toSeq)
+ nameParts.length != 1 || catalogManager.isSystemSessionOnPath
+ }
+
+ /**
+ * Search-path entries to report in `UNRESOLVED_VARIABLE` for DML lookups
(`SET VAR`,
+ * `FETCH ... INTO`). The full SQL path is reported regardless of how the
name was
+ * qualified, matching the convention used by `TABLE_OR_VIEW_NOT_FOUND` and
+ * `UNRESOLVED_ROUTINE`. Keeping the rendering qualification-independent
also avoids
+ * re-shaping the error if Spark ever grows struct-field assignment, where
2-part forms
+ * become genuinely ambiguous.
+ *
+ * DDL paths (`DECLARE` / `DROP` name validation in
+ * [[org.apache.spark.sql.catalyst.analysis.ResolveCatalogs]]) do not
consult the SQL path
+ * and report `[system.session]` directly at their throw site.
+ */
+ def searchPathEntriesForError: Seq[Seq[String]] = {
Review Comment:
Question on the convention: `searchPathEntriesForError` returns the full SQL
path, so `UNRESOLVED_VARIABLE` for an unresolved name like `LOCALVAR` (used in
a SQL script — see the `SqlScriptingExecutionSuite` golden change to ``
[`system`.`builtin`, `system`.`session`, `spark_catalog`.`default`] ``) now
lists `system.builtin` and `spark_catalog.default` even though variables can
only ever live in `system.session`. The Scaladoc justifies it as matching
`TABLE_OR_VIEW_NOT_FOUND` / `UNRESOLVED_ROUTINE`, which is internally
consistent — but for variables those other entries are noise. Is the
user-facing message acceptable here, or would filtering down to entries that
could actually contain variables (i.e., `system.session`) be clearer? Happy
either way, just want to confirm the UX choice was deliberate.
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveFetchCursor.scala:
##########
@@ -64,7 +64,8 @@ class ResolveFetchCursor(val catalogManager: CatalogManager)
extends Rule[Logica
nameParts = u.nameParts
) match {
case Some(variable) => variable.copy(canFold = false)
- case _ => throw unresolvedVariableError(u.nameParts, Seq("SYSTEM",
"SESSION"))
+ case _ => throw unresolvedVariableError(
+ u.nameParts, Seq(Seq("SYSTEM", "SESSION")), u.origin)
Review Comment:
The PR description's Bonus item promises "DML lookup failures (`SET VAR`,
`FETCH ... INTO`) report the full SQL path as a bracketed list."
`ResolveSetVariable` does this via
`variableResolution.searchPathEntriesForError` (ResolveSetVariable.scala:68),
but here the error still hardcodes `Seq(Seq("SYSTEM", "SESSION"))`. The file
already instantiates `variableResolution` (line 37-38), so this is a one-line
fix. As-is, a `FETCH ... INTO` miss reports `` [`SYSTEM`.`SESSION`] ``
regardless of the actual PATH — exactly the asymmetry the bonus item set out to
remove.
```suggestion
case _ => throw unresolvedVariableError(
u.nameParts, variableResolution.searchPathEntriesForError,
u.origin)
```
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala:
##########
@@ -202,12 +202,59 @@ class CatalogManager(
currentCatalog, currentNamespace,
currentCatalog, currentNamespace)
- /** True if [[sqlResolutionPathEntries]] includes `system.session`. */
- def sessionScopeUnqualifiedAllowed(
- currentCatalog: String,
- currentNamespace: Seq[String]): Boolean =
- sqlResolutionPathEntries(currentCatalog, currentNamespace)
- .exists(CatalogManager.isSystemSessionPathEntry)
+ /**
+ * True if `system.session` is on the SQL path. Only literal path entries
can match; the
+ * [[CurrentSchemaEntry]] marker can never be `system.session`. Inspecting
stored entries
Review Comment:
The Scaladoc claim "the [[CurrentSchemaEntry]] marker can never be
`system.session`" is true today only because `system` cannot be a current
catalog (`FakeSystemCatalog` isn't user-selectable, and `USE CATALOG system`
would fail in `currentCatalog`/`catalog(...)`). Worth making that load-bearing
assumption explicit so a future change that registers a real catalog under
`system` doesn't silently make this check wrong. Something like "…because
`system` is not a registered catalog (only the internal `FakeSystemCatalog`),
so `CurrentSchemaEntry`'s expansion `currentCatalog.name() +: currentNamespace`
cannot start with `system`" would be enough.
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionResolution.scala:
##########
@@ -370,7 +355,14 @@ class FunctionResolution(
if (nameParts.length == 1) {
// Must match [[resolutionCandidates]] / [[resolveFunction]]:
single-part names use PATH +
// session order, not only the current namespace (LookupCatalog
single-part rule).
- for (candidate <- resolutionCandidates(nameParts)) {
+ // `system.*` candidates are handled by [[lookupBuiltinOrTempFunction]] /
+ // [[lookupBuiltinOrTempTableFunction]] above; skip them here to avoid
redundant
+ // catalog calls.
+ val persistentCandidates = resolutionCandidates(nameParts).filterNot { c
=>
+ c.length >= 2 &&
+ c.head.equalsIgnoreCase(CatalogManager.SYSTEM_CATALOG_NAME)
+ }
Review Comment:
The filter `c.head.equalsIgnoreCase(SYSTEM_CATALOG_NAME)` strips ANY
candidate beginning with `system`, but the comment only enumerates
`system.session` / `system.builtin` / `system.ai` as already-resolved. Today
this is fine — `system.ai` doesn't even exist in this codebase, and only
`lookupBuiltinOrTempFunction` covers `system.session`/`system.builtin`. But if
a future PR registers a real catalog-resolved namespace under `system.<x>`
(anything not handled by `identifierFromSystemNameParts`), this filter will
silently skip persistent lookups for it. Consider tightening to the
actually-handled set, e.g. checking the second segment against
`SESSION_NAMESPACE` / `BUILTIN_NAMESPACE`, so the optimization tracks the
redundancy reason rather than the prefix. Not blocking — a comment caveat would
also work.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]