cloud-fan commented on code in PR #54781:
URL: https://github.com/apache/spark/pull/54781#discussion_r2928783604
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala:
##########
@@ -108,6 +108,12 @@ trait FunctionRegistryBase[T] {
/** Drop a function and return whether the function existed. */
def dropFunction(name: FunctionIdentifier): Boolean
+ /**
+ * Remove all cached function entries in the given database.
+ * Keeps the cache coherent when a database is dropped.
+ */
+ def dropFunctionsInDatabase(db: String): Unit
Review Comment:
`dropFunctionsInDatabase(db: String)` filters only by `database`, ignoring
the `catalog` field of `FunctionIdentifier`. The registry stores entries with
different catalog values (builtins: `None`, session: `Some("system")`,
persistent: per-namespace), so matching only on database could clear functions
from unrelated catalogs that share the same database name.
Consider accepting a `FunctionIdentifier` with empty `funcName` as a
namespace filter instead — this pattern already exists in `SessionCatalog` with
`SESSION_NAMESPACE_TEMPLATE` and `BUILTIN_NAMESPACE_TEMPLATE`
(SessionCatalog.scala:142-150), which use `FunctionIdentifier(funcName = "",
database = ..., catalog = ...)` as namespace templates.
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala:
##########
@@ -491,6 +491,11 @@ class SessionCatalog(
invalidateCachedTable(QualifiedTableName(SESSION_CATALOG_NAME, dbName,
t.table))
}
}
+ if (databaseExists(dbName)) {
Review Comment:
The function registry clearing is not guarded by `cascade`, unlike the table
cache invalidation at line 489. When `cascade=false`, either the DB is empty
(clearing is a no-op) or `externalCatalog.dropDatabase` will throw for a
non-empty DB (`InMemoryCatalog`:140-144) — but the registry has already been
cleared before that throw. Consider moving the clearing inside the existing
`cascade` block, which also consolidates the redundant `databaseExists` call.
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala:
##########
@@ -491,6 +491,11 @@ class SessionCatalog(
invalidateCachedTable(QualifiedTableName(SESSION_CATALOG_NAME, dbName,
t.table))
}
}
+ if (databaseExists(dbName)) {
+ // Clear cached functions in this database so cache stays coherent after
drop
Review Comment:
Comment says "after drop" but clearing happens before
`externalCatalog.dropDatabase` at line 499.
```suggestion
// Clear cached functions in this database so the cache stays coherent
on drop
```
##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala:
##########
@@ -297,6 +297,69 @@ abstract class SessionCatalogSuite extends AnalysisTest
with Eventually {
}
}
+ test("drop database clears function registry cache (cache coherence)") {
Review Comment:
Consider adding a negative test that registers a function in a *different*
database and verifies it survives the drop. This would catch an overly
aggressive implementation that clears too many entries.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]