srielau commented on code in PR #55717:
URL: https://github.com/apache/spark/pull/55717#discussion_r3196686611


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AbstractSqlParser.scala:
##########
@@ -110,6 +111,18 @@ abstract class AbstractSqlParser extends AbstractParser 
with ParserInterface {
     }
   }
 
+  /**
+   * Parse the right-hand side of `SET PATH = ...` (a comma-separated list of 
path elements).
+   * Used by [[org.apache.spark.sql.connector.catalog.CatalogManager]] to 
honor the
+   * [[SQLConf.DEFAULT_PATH]] conf without re-implementing the SET PATH 
grammar.
+   */
+  def parseSqlPathElements(sqlText: String): Seq[PathElement] = parse(sqlText) 
{ parser =>

Review Comment:
   **Naming convention.** Other `parseX` methods on this class are 
`parseExpression`, `parsePlan`, `parseTableIdentifier`, 
`parseFunctionIdentifier`, `parseMultipartIdentifier`, `parseQuery`, 
`parseRoutineParam` — no `parseSql*` prefix.
   
   ```suggestion
     def parsePathElements(sqlText: String): Seq[PathElement] = parse(sqlText) 
{ parser =>
   ```
   
   (One call site at `CatalogManager.scala:201` would also need updating.)



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -2471,6 +2471,18 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val DEFAULT_PATH =

Review Comment:
   **Validate `DEFAULT_PATH` at set time.** A bad value (e.g. `"this is not a 
path"`) currently makes `workspaceDefaultPathEntries` throw `ParseException` 
from any unqualified function/table reference — the lone test (`DEFAULT_PATH 
conf: rejected value surfaces a parse error`) only checks `current_path()`. Add 
a `.checkValue(...)` on the conf builder (parse + expand defensively, 
swallowing the result) so bad input is rejected when `SET spark.sql.defaultPath 
= ...` runs, with a clear error pointing at the conf key — not at the user's 
`SELECT abs(x)` three queries later.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/PathElement.scala:
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog
+
+import org.apache.spark.sql.connector.catalog.CatalogManager.{
+  CurrentSchemaEntry, LiteralPathEntry, SessionPathEntry
+}
+import org.apache.spark.sql.errors.QueryCompilationErrors
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * One element on the right-hand side of `SET PATH = ...`: either a well-known 
shortcut
+ * keyword (DEFAULT_PATH, SYSTEM_PATH, PATH, CURRENT_SCHEMA / 
CURRENT_DATABASE) or a
+ * fully qualified schema reference (`catalog.namespace...` with at least 2 
parts).
+ *
+ * The same grammar is reused to parse the [[SQLConf.DEFAULT_PATH]] conf 
value, so this
+ * AST node lives in catalyst beside [[CatalogManager]] rather than in the 
runtime
+ * [[org.apache.spark.sql.execution.command.SetPathCommand]].
+ */
+sealed trait PathElement
+
+object PathElement {
+  case object DefaultPath extends PathElement
+  case object SystemPath extends PathElement
+  case object PathRef extends PathElement
+
+  /**
+   * Current database/schema (SQL aliases). Stored as the 
[[CurrentSchemaEntry]] marker
+   * so resolution candidates expand against the live `USE SCHEMA`.
+   */
+  case object CurrentSchema extends PathElement
+
+  /** Fully qualified schema reference (`catalog.namespace...`). Must have at 
least 2 parts. */
+  case class SchemaInPath(parts: Seq[String]) extends PathElement
+
+  /**
+   * Expand a parsed [[PathElement]] list into concrete [[SessionPathEntry]] 
entries
+   * suitable for storing in [[CatalogManager._sessionPath]] or returning from
+   * [[CatalogManager.sessionPathEntries]].
+   *
+   * @param isWorkspaceDefaultExpansion when true, an inner [[DefaultPath]] 
token resolves
+   *                                    to the spark-builtin default ordering 
(cycle break)
+   *                                    rather than reading 
[[SQLConf.DEFAULT_PATH]] again.
+   *                                    Set to true when this method is 
invoked while
+   *                                    parsing [[SQLConf.DEFAULT_PATH]] 
itself.
+   */
+  def expand(
+      elements: Seq[PathElement],
+      conf: SQLConf,
+      catalogManager: CatalogManager,
+      isWorkspaceDefaultExpansion: Boolean = false): Seq[SessionPathEntry] = {
+    val currentSchemaSentinel = Seq("__current_schema__")
+
+    def toEntries(parts: Seq[Seq[String]]): Seq[SessionPathEntry] = parts.map {
+      case p if p == currentSchemaSentinel => CurrentSchemaEntry
+      case p => LiteralPathEntry(p)
+    }
+
+    def builtinDefaultWithCurrentSchema: Seq[SessionPathEntry] =
+      toEntries(conf.defaultPathOrder(Seq(currentSchemaSentinel)))
+
+    def defaultPathExpansion: Seq[SessionPathEntry] = {
+      if (isWorkspaceDefaultExpansion) {
+        // Cycle break: inner DEFAULT_PATH inside the workspace default conf 
value falls
+        // back to the spark-builtin default ordering instead of recursing.
+        builtinDefaultWithCurrentSchema
+      } else {
+        
catalogManager.workspaceDefaultPathEntries.getOrElse(builtinDefaultWithCurrentSchema)
+      }
+    }
+
+    elements.flatMap {
+      case DefaultPath =>
+        defaultPathExpansion
+      case SystemPath =>
+        toEntries(conf.systemPathOrder)
+      case CurrentSchema =>
+        Seq(CurrentSchemaEntry)
+      case PathRef =>
+        catalogManager.storedSessionPathEntries.getOrElse(defaultPathExpansion)
+      case SchemaInPath(parts) =>
+        if (parts.length < 2) {

Review Comment:
   **Validate at parse time, not at every expansion.** This `parts.length < 2` 
check fires every time `expand` runs. Two consequences:
   
   1. `DEFAULT_PATH = "foo"` (single-part) parses cleanly, then throws from 
inside any unqualified function/table resolution. The error trace looks like a 
resolution failure of the user's SQL, not a conf problem.
   2. `SET PATH = foo` succeeds at parse and only fails when 
`SetPathCommand.run` calls `expand`.
   
   Move the check into `AstBuilder.visitPathElement` (or a smart constructor 
`SchemaInPath.apply`) so both paths fail at parse time with the original SQL 
position attached.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala:
##########
@@ -52,6 +53,38 @@ class CatalogManager(
   // TODO: create a real SYSTEM catalog to host `TempVariableManager` under 
the SESSION namespace.
   val tempVariableManager: TempVariableManager = new TempVariableManager
 
+  // Wire the path-driven kinds provider into SessionCatalog so the 
function-resolution loop
+  // and the security check that blocks temp functions from shadowing builtins 
read the live
+  // SQL PATH (post-`SET PATH`, with `DEFAULT_PATH` and `defaultPathOrder` 
fallbacks already
+  // applied) instead of the legacy 
[[SQLConf.SESSION_FUNCTION_RESOLUTION_ORDER]] proxy.
+  // Order is every `system.builtin` / `system.session` entry on the PATH in 
path order
+  // (user-catalog segments in between are skipped for this list).

Review Comment:
   Sentence fragment — "Order is every…" reads as a missing-subject clause.
   
   ```suggestion
     // The order is the sequence of `system.builtin` / `system.session` 
entries on the PATH
     // in path order (user-catalog segments in between are skipped for this 
list).
   ```



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala:
##########
@@ -52,6 +53,38 @@ class CatalogManager(
   // TODO: create a real SYSTEM catalog to host `TempVariableManager` under 
the SESSION namespace.
   val tempVariableManager: TempVariableManager = new TempVariableManager
 
+  // Wire the path-driven kinds provider into SessionCatalog so the 
function-resolution loop
+  // and the security check that blocks temp functions from shadowing builtins 
read the live
+  // SQL PATH (post-`SET PATH`, with `DEFAULT_PATH` and `defaultPathOrder` 
fallbacks already
+  // applied) instead of the legacy 
[[SQLConf.SESSION_FUNCTION_RESOLUTION_ORDER]] proxy.
+  // Order is every `system.builtin` / `system.session` entry on the PATH in 
path order
+  // (user-catalog segments in between are skipped for this list).
+  v1SessionCatalog.sessionFunctionKindsProvider = () => 
leadingSystemFunctionKinds()
+
+  /**
+   * Walk the effective SQL PATH and collect every system function namespace 
entry
+   * (`system.builtin` / `system.session`) in path order, mapped to
+   * [[SessionCatalog.SessionFunctionKind]].
+   *
+   * User-catalog entries (e.g. `spark_catalog.default`) appear on the PATH 
before
+   * `system.builtin` / `system.session`; they must not hide later system 
entries.
+   * Unqualified builtin/temp resolution follows this ordered kind list via
+   * 
[[org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunctionWithShadowing]].
+   *
+   * When no system entries appear on the PATH (user-defined path without 
builtins/session),
+   * falls back to builtin-then-session so bare names still resolve like the 
legacy default.
+   */
+  private def leadingSystemFunctionKinds(): 
Seq[SessionCatalog.SessionFunctionKind] = {
+    val path = sqlResolutionPathEntries(currentCatalog.name(), 
currentNamespace.toSeq)
+    val kinds = path.flatMap { e =>
+      if (CatalogManager.isSystemBuiltinPathEntry(e)) 
Some(SessionCatalog.Builtin)
+      else if (CatalogManager.isSystemSessionPathEntry(e)) 
Some(SessionCatalog.Temp)
+      else None
+    }
+    if (kinds.nonEmpty) kinds
+    else Seq(SessionCatalog.Builtin, SessionCatalog.Temp)

Review Comment:
   **Untested fallback.** This branch fires when the effective PATH contains 
user catalogs but no `system.builtin` / `system.session` entries (e.g., `SET 
PATH = spark_catalog.mydb`). Worth a unit test — both for regression coverage 
and to document the design intent (bare names still resolve like the legacy 
default even when the user removes system entries from PATH). Currently no test 
exercises `kinds.isEmpty`.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -2471,6 +2471,18 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val DEFAULT_PATH =
+    buildConf("spark.sql.defaultPath")
+      .version("4.2.0")
+      .doc("Default SQL PATH used when no SET PATH has been issued in the 
session, and the " +
+        "value SET PATH = DEFAULT_PATH expands to. Accepts the full SET PATH 
grammar; an inner " +
+        "DEFAULT_PATH token resolves to the spark-builtin default ordering. 
When empty, the " +
+        "spark-builtin default ordering controlled by " +
+        "[[SESSION_FUNCTION_RESOLUTION_ORDER]] applies.")

Review Comment:
   Two doc nits: awkward `"and the value SET PATH = DEFAULT_PATH expands to"` 
phrasing, and `[[SESSION_FUNCTION_RESOLUTION_ORDER]]` is Scaladoc syntax that 
won't render in the user-facing config doc string.
   
   ```suggestion
         .doc("Default SQL PATH used when no SET PATH has been issued in the 
session; this " +
           "is also the value to which `SET PATH = DEFAULT_PATH` expands. 
Accepts the full " +
           "SET PATH grammar; an inner DEFAULT_PATH token resolves to the 
spark-builtin " +
           "default ordering. When empty, the spark-builtin default ordering 
controlled by " +
           "spark.sql.legacy.sessionFunctionResolutionOrder applies.")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to