cloud-fan commented on code in PR #54765:
URL: https://github.com/apache/spark/pull/54765#discussion_r2942458744
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionResolution.scala:
##########
@@ -75,21 +75,52 @@ class FunctionResolution(
/**
* Produces the ordered list of fully qualified candidate names for
resolution.
+ * Every candidate is 3-part (catalog.database.function), assuming search
path
+ * entries are catalog.database (2-part).
*
* @param nameParts The function name parts.
- * @return A sequence of fully qualified function names to attempt
resolution with.
+ * @return A sequence of fully qualified 3-part names to attempt resolution
with.
Review Comment:
The Scaladoc says "Every candidate is 3-part" but line 117 returns 2-part
names for catalogs without a default namespace (the comment on line 116 even
says "use 2-part").
```suggestion
* Candidates are normally 3-part (catalog.database.function), assuming
search path
* entries are catalog.database (2-part). A 2-part candidate is returned
when a
* V2 catalog has no default namespace.
*
* @param nameParts The function name parts.
* @return A sequence of candidate names (usually 3-part) to attempt
resolution with.
```
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionResolution.scala:
##########
@@ -75,21 +75,52 @@ class FunctionResolution(
/**
* Produces the ordered list of fully qualified candidate names for
resolution.
+ * Every candidate is 3-part (catalog.database.function), assuming search
path
+ * entries are catalog.database (2-part).
*
* @param nameParts The function name parts.
- * @return A sequence of fully qualified function names to attempt
resolution with.
+ * @return A sequence of fully qualified 3-part names to attempt resolution
with.
*/
private def resolutionCandidates(nameParts: Seq[String]): Seq[Seq[String]] =
{
+ def ensureThreePart(parts: Seq[String]): Seq[String] =
+ if (parts.length == 3) parts
+ else if (parts.length == 2) currentCatalogPath.head +: parts
+ else parts // Safety fallback; valid paths yield 2 or 3 parts before
this point.
+
if (nameParts.size == 1) {
+ // Search path entries are 2-part (catalog.database), so path ++
nameParts is always 3-part.
val searchPath = SQLConf.get.resolutionSearchPath(currentCatalogPath)
searchPath.map(_ ++ nameParts)
} else {
nameParts.size match {
- case 2 if FunctionResolution.sessionNamespaceKind(nameParts).isDefined
=>
- // Partially qualified builtin/session: try persistent first so user
schema wins
- Seq(nameParts, Seq(CatalogManager.SYSTEM_CATALOG_NAME) ++ nameParts)
+ case 2
+ if FunctionResolution.sessionNamespaceKind(nameParts).isDefined =>
+ // Partially qualified builtin/session: produce both candidates as
3-part
+ // (catalog.database.func).
+ val systemCandidate = CatalogManager.SYSTEM_CATALOG_NAME +: nameParts
+ val persistentCandidate = currentCatalogPath.head +: nameParts
+ if (SQLConf.get.prioritizeSystemCatalog) {
+ Seq(systemCandidate, persistentCandidate)
+ } else {
+ Seq(persistentCandidate, systemCandidate)
+ }
+ case 2 if catalogManager.isCatalogRegistered(nameParts(0)) =>
Review Comment:
This `case 2 if catalogManager.isCatalogRegistered` branch is new behavior
not present in the old code and not mentioned in the PR description.
Previously, 2-part names like `mycatalog.func` passed through unchanged to
`resolveQualifiedFunction`, where `CatalogAndIdentifier.unapply` would create
an empty-namespace identifier — the same convention used for tables
(`mycatalog.mytable` also gets an empty namespace, and the catalog fills in its
default internally).
The new code explicitly fetches `cat.defaultNamespace()` and pre-fills it,
which:
- Is **inconsistent with table resolution**, where V2 catalogs handle empty
namespaces themselves
- **Drops multi-level default namespaces** — `defaultNs(0)` only takes the
first element, so `Array("ns1", "ns2")` loses `"ns2"`
- Has no test coverage
This should either be removed (let `CatalogAndIdentifier` handle it, as
before) or justified and tested separately.
##########
sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala:
##########
@@ -363,24 +364,55 @@ class SparkSessionExtensions {
private[this] val injectedTableFunctions =
mutable.Buffer.empty[TableFunctionDescription]
+ /**
+ * Normalizes an extension function identifier to a fully qualified one for
registration.
+ * Accepts unqualified (1-part), fully qualified (3-part), or 2-part system
names
+ * (builtin.func, session.func). It rejects other partially qualified names.
+ */
+ private def fullyQualifiedFunctionIdentifier(name: FunctionIdentifier):
FunctionIdentifier = {
+ (name.catalog.isEmpty, name.database.isEmpty) match {
+ case (true, true) =>
+ // Unqualified names are registered in system.builtin.
+ FunctionRegistry.builtinFunctionIdentifier(name.funcName)
+ case (false, false) =>
+ // Fully qualified names are used as-is.
+ name
+ case (true, false) =>
Review Comment:
`fullyQualifiedFunctionIdentifier` maps ALL 2-part names to the system
catalog, not just `builtin` and `session`. If a user calls `injectFunction`
with `FunctionIdentifier("func", Some("custom_db"))`, it would be stored as
`system.custom_db.func` — unreachable by any standard resolution path. No
error, the function just silently never resolves.
Consider validating that the database is `builtin` or `session`, or falling
through to the invalid case for other databases.
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala:
##########
@@ -197,8 +197,8 @@ case class ShowFunctionsCommand(
sparkSession.sessionState.catalog
.listFunctions(db, pattern.getOrElse("*"))
.collect {
- case (f, "USER") if showUserFunctions => f.unquotedString
- case (f, "SYSTEM") if showSystemFunctions => f.unquotedString
+ case (f, "USER") if showUserFunctions =>
f.displayNameForShowFunctions
Review Comment:
With the 3-part key change, `listFunctions` now produces different tuple
values for a builtin and a same-named temp function. Both display as "abs" but
have different `FunctionIdentifier` shapes and different categories ("SYSTEM"
vs "USER"), so the upstream `.distinct` on tuples doesn't collapse them.
`ShowFunctionsCommand` would show "abs" twice when a temp function shadows a
builtin.
The v2 path (`ShowFunctionsExec`) avoids this with `.distinct` on
display-name strings. Consider adding `.distinct` to `functionNames` here as
well — e.g., after the `.collect { ... }` block.
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala:
##########
@@ -210,10 +210,26 @@ trait SimpleFunctionRegistryBase[T] extends
FunctionRegistryBase[T] with Logging
protected val functionBuilders =
new mutable.HashMap[FunctionIdentifier, (ExpressionInfo, FunctionBuilder)]
- // Resolution of the function name is always case insensitive, but the
database name
- // depends on the caller
+ // Function name resolution is case-insensitive; database and catalog are
preserved so
+ // system.session.foo and spark_catalog.session.foo do not collide.
Unqualified (1-part)
+ // lookups resolve to the builtin 3-part key so that callers using
FunctionIdentifier("time")
+ // find functions registered as system.builtin.time. A 2-part name with
database=session
+ // and no catalog resolves to the temp 3-part key.
private def normalizeFuncName(name: FunctionIdentifier): FunctionIdentifier
= {
- FunctionIdentifier(name.funcName.toLowerCase(Locale.ROOT), name.database)
+ if (name.database.isEmpty && name.catalog.isEmpty) {
Review Comment:
`normalizeFuncName` now has branches beyond lowercasing, encoding the
assumption "unqualified = builtin" and "database=session without catalog =
system.session" in the storage layer. The registry is a storage layer — ideally
it should only normalize case. The caller (SessionCatalog / FunctionResolution)
already knows what kind of function it's registering or looking up and should
provide the exact storage key.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]