srielau opened a new pull request, #56038:
URL: https://github.com/apache/spark/pull/56038

   ### What changes were proposed in this pull request?
   
   Backports the default SQL `PATH` stabilization stack from `master` to 
`branch-4.x`. Four commits, applied in dependency order:
   
   | # | Upstream commit | PR | JIRA |
   |---|-----------------|----|------|
   | 1 | `3a3e4edb02d` | [#55647](https://github.com/apache/spark/pull/55647) | 
[SPARK-56681][SQL] PATH cleanup |
   | 2 | `cc16f9166e9` | [#55717](https://github.com/apache/spark/pull/55717) | 
[SPARK-56750][SQL] default path config |
   | 3 | `26dd5955522` | [#55866](https://github.com/apache/spark/pull/55866) | 
[SPARK-56853][SQL][TESTS] Improve PATH Tests |
   | 4 | `6ecd76f8345` | [#55977](https://github.com/apache/spark/pull/55977) | 
[SPARK-56939][SQL] Resolve deadlock between USE and function lookup |
   
   Net effect on `branch-4.x`:
   
   - New session conf `spark.sql.defaultPath` (parallel to 
`spark.sql.defaultCatalog`), with `SET PATH = DEFAULT_PATH` expanding to its 
value and an inner `DEFAULT_PATH` token cycle-safely falling back to the 
spark-builtin default ordering.
   - `CatalogManager` materializes a single effective 
`sqlResolutionPathEntries` from stored `SET PATH`, the parsed `DEFAULT_PATH` 
conf, and the seeded `defaultPathOrder`. `current_path()`, unqualified function 
resolution, and unqualified relation resolution all read from this single 
source of truth.
   - Function-resolution shortcuts (the `COUNT(*) -> COUNT(1)` rewrite gate and 
the temp-vs-builtin security check) are now driven by the live PATH via 
`FunctionResolution.isSessionBeforeBuiltinInPath` / 
`CatalogManager.systemFunctionKindsFromPath`, fixing a semantic gap where `SET 
PATH = system.session, system.builtin, ...` did not flip those gates unless the 
legacy `spark.sql.functionResolution.sessionOrder` conf was also flipped.
   - `SET PATH` rejects static literal-vs-literal duplicates at statement time. 
Literal-vs-`CURRENT_SCHEMA` collisions are tolerated (state-dependent and 
harmless under first-match resolution).
   - Concurrency fix: `CatalogManager.setCurrentNamespace` / 
`setCurrentCatalog` / `reset` no longer hold the manager's intrinsic lock 
across the `v1SessionCatalog` callbacks (split-lock pattern). New 
`CatalogManager.sessionFunctionKindsForUnqualifiedResolution()` is the single 
snapshot point shared by `SessionCatalog.sessionFunctionKindsInResolutionOrder` 
and `FunctionResolution.isSessionBeforeBuiltinInPath`, eliminating the 
`SessionCatalog <-> CatalogManager` lock-order inversion that could deadlock 
concurrent `USE SCHEMA` / `USE CATALOG` and unqualified function lookups.
   - Test coverage: Spark Connect E2E (`SqlPathE2ETestSuite`), PySpark 
(`test_path_*` in `CatalogTestsMixin`), golden SQL 
(`sql-tests/inputs/sql-path.sql`), V2 catalogs in PATH 
(`SqlPathV2CatalogSuite`), PATH-disabled persisted-view fallback, 
`cloneSession()` propagation matrix, `ALTER VIEW ... WITH SCHEMA` frozen-path 
preservation, and `CatalogManagerSuite` / `SqlPathFormatSuite` unit tests for 
serialization, `PathElement.validateNoStaticDuplicates`, and display formatting.
   
   Two minor conflict resolutions during cherry-pick, both confined to 
`sql/core/src/test/scala/org/apache/spark/sql/SetPathSuite.scala`:
   - During #55977: brought in the new SPARK-56939 concurrency test verbatim; 
the conflict was an empty HEAD side on this branch.
   - During #55866: kept the SPARK-56939 concurrency test already on this 
branch (#55866 predates #55977 in time on master, so its incoming side was 
empty at that hunk).
   
   ### Why are the changes needed?
   
   The default `PATH` feature plus the deadlock fix have shipped together on 
master. Backporting all four commits keeps `branch-4.x` consistent with the 
released semantics: admins get a sessionable default path, the 
routine-resolution shortcuts agree with `SET PATH`, and the `USE SCHEMA` / `USE 
CATALOG` deadlock that surfaced after #55717 is fixed.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, on `branch-4.x` (matches the user-facing surface already on master):
   
   - New conf `spark.sql.defaultPath`.
   - With `spark.sql.path.enabled=true`, sessions without explicit `SET PATH` 
resolve through the configured default.
   - `SET PATH = DEFAULT_PATH` and `CURRENT_PATH()` reflect the configured 
default.
   - Function-resolution shortcuts (`COUNT(*) -> COUNT(1)` rewrite gate, 
temp-vs-builtin security check) now follow the effective PATH instead of only 
`spark.sql.functionResolution.sessionOrder`.
   - `UNRESOLVED_VARIABLE.searchPath` is rendered as a bracketed list and (for 
DML lookups) reflects the live SQL path.
   - `DROP TEMPORARY VARIABLE` no longer raises `UNRESOLVED_VARIABLE` when 
`system.session` is not on the path (DDL on session variables ignores PATH, 
matching `DECLARE OR REPLACE VARIABLE`).
   - `lookupFunctionType` no longer swallows non-`NotFound` catalog errors as 
`UNRESOLVED_ROUTINE`.
   - No serial-behavior change from #55977; under contention, the session no 
longer deadlocks.
   
   ### How was this patch tested?
   
   Local on the backport branch (`branch-4.x` base):
   
   - `build/sbt catalyst/compile sql/compile sql/Test/compile` -- clean, no 
errors (1m42s).
   - `build/sbt 'sql/testOnly *SetPathSuite'` -- 60/60 pass (includes the 
SPARK-56939 concurrent-`USE`/lookup test and the SPARK-56853 PATH-QA additions).
   - `build/sbt 'catalyst/testOnly *CatalogManagerSuite *FunctionResolution* 
*SessionCatalogSuite *SqlPathFormatSuite'` -- 126/126 pass.
   - Broader suites (`*SQLViewSuite`, `*SQLFunctionSuite`, 
`*RelationQualificationSuite`, `*FunctionQualificationSuite`, 
`*DataSourceV2FunctionSuite`) are running locally in parallel with CI.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Cursor / Claude Opus 4.7 (backport mechanics: cherry-pick, 
conflict resolution, PR description). The original product PRs disclose their 
own AI tooling usage individually.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to