Re: [PR] [SPARK-55964] Cache coherence: clear function registry on DROP DATABASE [spark]

via GitHub Thu, 12 Mar 2026 20:17:44 -0700


cloud-fan commented on code in PR #54781:
URL: https://github.com/apache/spark/pull/54781#discussion_r2928783604



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala:
##########
@@ -108,6 +108,12 @@ trait FunctionRegistryBase[T] {
   /** Drop a function and return whether the function existed. */
   def dropFunction(name: FunctionIdentifier): Boolean
 
+  /**
+   * Remove all cached function entries in the given database.
+   * Keeps the cache coherent when a database is dropped.
+   */
+  def dropFunctionsInDatabase(db: String): Unit

Review Comment:
   `dropFunctionsInDatabase(db: String)` filters only by `database`, ignoring 
the `catalog` field of `FunctionIdentifier`. The registry stores entries with 
different catalog values (builtins: `None`, session: `Some("system")`, 
persistent: per-namespace), so matching only on database could clear functions 
from unrelated catalogs that share the same database name.
   
   Consider accepting a `FunctionIdentifier` with empty `funcName` as a 
namespace filter instead — this pattern already exists in `SessionCatalog` with 
`SESSION_NAMESPACE_TEMPLATE` and `BUILTIN_NAMESPACE_TEMPLATE` 
(SessionCatalog.scala:142-150), which use `FunctionIdentifier(funcName = "", 
database = ..., catalog = ...)` as namespace templates.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala:
##########
@@ -491,6 +491,11 @@ class SessionCatalog(
         invalidateCachedTable(QualifiedTableName(SESSION_CATALOG_NAME, dbName, 
t.table))
       }
     }
+    if (databaseExists(dbName)) {

Review Comment:
   The function registry clearing is not guarded by `cascade`, unlike the table 
cache invalidation at line 489. When `cascade=false`, either the DB is empty 
(clearing is a no-op) or `externalCatalog.dropDatabase` will throw for a 
non-empty DB (`InMemoryCatalog`:140-144) — but the registry has already been 
cleared before that throw. Consider moving the clearing inside the existing 
`cascade` block, which also consolidates the redundant `databaseExists` call.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala:
##########
@@ -491,6 +491,11 @@ class SessionCatalog(
         invalidateCachedTable(QualifiedTableName(SESSION_CATALOG_NAME, dbName, 
t.table))
       }
     }
+    if (databaseExists(dbName)) {
+      // Clear cached functions in this database so cache stays coherent after 
drop

Review Comment:
   Comment says "after drop" but clearing happens before 
`externalCatalog.dropDatabase` at line 499.
   
   ```suggestion
         // Clear cached functions in this database so the cache stays coherent 
on drop
   ```



##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala:
##########
@@ -297,6 +297,69 @@ abstract class SessionCatalogSuite extends AnalysisTest 
with Eventually {
     }
   }
 
+  test("drop database clears function registry cache (cache coherence)") {

Review Comment:
   Consider adding a negative test that registers a function in a *different* 
database and verifies it survives the drop. This would catch an overly 
aggressive implementation that clears too many entries.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-55964] Cache coherence: clear function registry on DROP DATABASE [spark]

Reply via email to