MaxGekk commented on a change in pull request #31308:
URL: https://github.com/apache/spark/pull/31308#discussion_r563347923
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
##########
@@ -561,16 +561,9 @@ case class TruncateTableCommand(
}
}
}
- // After deleting the data, invalidate the table to make sure we don't
keep around a stale
- // file relation in the metastore cache.
- spark.sessionState.refreshTable(tableName.unquotedString)
- // Also try to drop the contents of the table from the columnar cache
- try {
-
spark.sharedState.cacheManager.uncacheQuery(spark.table(table.identifier),
cascade = true)
- } catch {
- case NonFatal(e) =>
- log.warn(s"Exception when attempting to uncache table
$tableIdentWithDB", e)
- }
+ // After deleting the data, refresh the table to make sure we don't keep
around a stale
+ // file relation in the metastore cache and cached table data in the cache
manager.
+ spark.catalog.refreshTable(tableIdentWithDB)
Review comment:
1. We don't catch any exception in other commands
2. What kind of exceptions should we hide from users here?
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
##########
@@ -561,16 +561,9 @@ case class TruncateTableCommand(
}
}
}
- // After deleting the data, invalidate the table to make sure we don't
keep around a stale
- // file relation in the metastore cache.
- spark.sessionState.refreshTable(tableName.unquotedString)
- // Also try to drop the contents of the table from the columnar cache
- try {
-
spark.sharedState.cacheManager.uncacheQuery(spark.table(table.identifier),
cascade = true)
- } catch {
- case NonFatal(e) =>
- log.warn(s"Exception when attempting to uncache table
$tableIdentWithDB", e)
- }
+ // After deleting the data, refresh the table to make sure we don't keep
around a stale
+ // file relation in the metastore cache and cached table data in the cache
manager.
+ spark.catalog.refreshTable(tableIdentWithDB)
Review comment:
1. We don't catch any exceptions in other commands
2. What kind of exceptions should we hide from users here?
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
##########
@@ -561,16 +561,9 @@ case class TruncateTableCommand(
}
}
}
- // After deleting the data, invalidate the table to make sure we don't
keep around a stale
- // file relation in the metastore cache.
- spark.sessionState.refreshTable(tableName.unquotedString)
- // Also try to drop the contents of the table from the columnar cache
- try {
-
spark.sharedState.cacheManager.uncacheQuery(spark.table(table.identifier),
cascade = true)
- } catch {
- case NonFatal(e) =>
- log.warn(s"Exception when attempting to uncache table
$tableIdentWithDB", e)
- }
+ // After deleting the data, refresh the table to make sure we don't keep
around a stale
+ // file relation in the metastore cache and cached table data in the cache
manager.
+ spark.catalog.refreshTable(tableIdentWithDB)
Review comment:
1. We don't catch any exceptions in other commands
2. What kind of exceptions should we hide (catch) from users here?
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
##########
@@ -561,16 +561,9 @@ case class TruncateTableCommand(
}
}
}
- // After deleting the data, invalidate the table to make sure we don't
keep around a stale
- // file relation in the metastore cache.
- spark.sessionState.refreshTable(tableName.unquotedString)
- // Also try to drop the contents of the table from the columnar cache
- try {
-
spark.sharedState.cacheManager.uncacheQuery(spark.table(table.identifier),
cascade = true)
- } catch {
- case NonFatal(e) =>
- log.warn(s"Exception when attempting to uncache table
$tableIdentWithDB", e)
- }
+ // After deleting the data, refresh the table to make sure we don't keep
around a stale
+ // file relation in the metastore cache and cached table data in the cache
manager.
+ spark.catalog.refreshTable(tableIdentWithDB)
Review comment:
Not clear which exceptions are caught here. `uncacheQuery()` doesn't
throw anything, `spark.table(table.identifier).logicalPlan` could fail but in
that case it is not clear how we reached this point.
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
##########
@@ -561,16 +561,9 @@ case class TruncateTableCommand(
}
}
}
- // After deleting the data, invalidate the table to make sure we don't
keep around a stale
- // file relation in the metastore cache.
- spark.sessionState.refreshTable(tableName.unquotedString)
- // Also try to drop the contents of the table from the columnar cache
- try {
-
spark.sharedState.cacheManager.uncacheQuery(spark.table(table.identifier),
cascade = true)
- } catch {
- case NonFatal(e) =>
- log.warn(s"Exception when attempting to uncache table
$tableIdentWithDB", e)
- }
+ // After deleting the data, refresh the table to make sure we don't keep
around a stale
+ // file relation in the metastore cache and cached table data in the cache
manager.
+ spark.catalog.refreshTable(tableIdentWithDB)
Review comment:
The case which you described here is applicable to any other commands
like add/drop/rename/recover partitions. I do believe we either should "fix"
all commands with tests for the case, or apply the approach w/o catching
exceptions here as we do in other commands so far (otherwise the implementation
looks inconsistent).
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
##########
@@ -561,16 +561,9 @@ case class TruncateTableCommand(
}
}
}
- // After deleting the data, invalidate the table to make sure we don't
keep around a stale
- // file relation in the metastore cache.
- spark.sessionState.refreshTable(tableName.unquotedString)
- // Also try to drop the contents of the table from the columnar cache
- try {
-
spark.sharedState.cacheManager.uncacheQuery(spark.table(table.identifier),
cascade = true)
- } catch {
- case NonFatal(e) =>
- log.warn(s"Exception when attempting to uncache table
$tableIdentWithDB", e)
- }
+ // After deleting the data, refresh the table to make sure we don't keep
around a stale
+ // file relation in the metastore cache and cached table data in the cache
manager.
+ spark.catalog.refreshTable(tableIdentWithDB)
Review comment:
> ... someone could drop a table or permanent view in a different
session, or drop a Hive table through beeline or HMS API.
If somebody dropped the table in parallel, updating statistics wouldn't
matter any more. We should show the error to user as soon as possible.
> can we fix the inconsistency first?
@sunchao Can you write a test which reproduces the issue?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]