sunchao commented on a change in pull request #31308:
URL: https://github.com/apache/spark/pull/31308#discussion_r563346341
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
##########
@@ -561,16 +561,9 @@ case class TruncateTableCommand(
}
}
}
- // After deleting the data, invalidate the table to make sure we don't
keep around a stale
- // file relation in the metastore cache.
- spark.sessionState.refreshTable(tableName.unquotedString)
- // Also try to drop the contents of the table from the columnar cache
- try {
-
spark.sharedState.cacheManager.uncacheQuery(spark.table(table.identifier),
cascade = true)
- } catch {
- case NonFatal(e) =>
- log.warn(s"Exception when attempting to uncache table
$tableIdentWithDB", e)
- }
+ // After deleting the data, refresh the table to make sure we don't keep
around a stale
+ // file relation in the metastore cache and cached table data in the cache
manager.
+ spark.catalog.refreshTable(tableIdentWithDB)
Review comment:
Hmm... Not sure if we should catch any non-fatal exception like the
previous one, since otherwise we'd skip updating stats?
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
##########
@@ -561,16 +561,9 @@ case class TruncateTableCommand(
}
}
}
- // After deleting the data, invalidate the table to make sure we don't
keep around a stale
- // file relation in the metastore cache.
- spark.sessionState.refreshTable(tableName.unquotedString)
- // Also try to drop the contents of the table from the columnar cache
- try {
-
spark.sharedState.cacheManager.uncacheQuery(spark.table(table.identifier),
cascade = true)
- } catch {
- case NonFatal(e) =>
- log.warn(s"Exception when attempting to uncache table
$tableIdentWithDB", e)
- }
+ // After deleting the data, refresh the table to make sure we don't keep
around a stale
+ // file relation in the metastore cache and cached table data in the cache
manager.
+ spark.catalog.refreshTable(tableIdentWithDB)
Review comment:
I think someone could drop a table or permanent view in a different
session, or drop a Hive table through beeline or HMS API. This may cause some
cache which depend on them AND this truncated table to become invalid, and
potentially analysis exception when recaching them. I haven't got a chance to
verify this though.
I feel overall it will be a good practice to recover from unknown errors
here and continue. Also `DropTableCommand` does this as well.
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
##########
@@ -561,16 +561,9 @@ case class TruncateTableCommand(
}
}
}
- // After deleting the data, invalidate the table to make sure we don't
keep around a stale
- // file relation in the metastore cache.
- spark.sessionState.refreshTable(tableName.unquotedString)
- // Also try to drop the contents of the table from the columnar cache
- try {
-
spark.sharedState.cacheManager.uncacheQuery(spark.table(table.identifier),
cascade = true)
- } catch {
- case NonFatal(e) =>
- log.warn(s"Exception when attempting to uncache table
$tableIdentWithDB", e)
- }
+ // After deleting the data, refresh the table to make sure we don't keep
around a stale
+ // file relation in the metastore cache and cached table data in the cache
manager.
+ spark.catalog.refreshTable(tableIdentWithDB)
Review comment:
I think someone could drop a table or permanent view in a different
session, or drop a Hive table through beeline or HMS API. This may cause some
cache which depend on them AND this truncated table to become invalid, and
potentially analysis exception when recaching them. I haven't got a chance to
verify this though.
I feel overall it will be a good practice to recover from unknown errors
here and continue. `DropTableCommand` does this as well.
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
##########
@@ -561,16 +561,9 @@ case class TruncateTableCommand(
}
}
}
- // After deleting the data, invalidate the table to make sure we don't
keep around a stale
- // file relation in the metastore cache.
- spark.sessionState.refreshTable(tableName.unquotedString)
- // Also try to drop the contents of the table from the columnar cache
- try {
-
spark.sharedState.cacheManager.uncacheQuery(spark.table(table.identifier),
cascade = true)
- } catch {
- case NonFatal(e) =>
- log.warn(s"Exception when attempting to uncache table
$tableIdentWithDB", e)
- }
+ // After deleting the data, refresh the table to make sure we don't keep
around a stale
+ // file relation in the metastore cache and cached table data in the cache
manager.
+ spark.catalog.refreshTable(tableIdentWithDB)
Review comment:
Personally I'd keep just the try-catch logic here because I think the
above do happens and we shouldn't skip updating stats in the case. But I don't
really have strong opinion on this.
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
##########
@@ -561,16 +561,9 @@ case class TruncateTableCommand(
}
}
}
- // After deleting the data, invalidate the table to make sure we don't
keep around a stale
- // file relation in the metastore cache.
- spark.sessionState.refreshTable(tableName.unquotedString)
- // Also try to drop the contents of the table from the columnar cache
- try {
-
spark.sharedState.cacheManager.uncacheQuery(spark.table(table.identifier),
cascade = true)
- } catch {
- case NonFatal(e) =>
- log.warn(s"Exception when attempting to uncache table
$tableIdentWithDB", e)
- }
+ // After deleting the data, refresh the table to make sure we don't keep
around a stale
+ // file relation in the metastore cache and cached table data in the cache
manager.
+ spark.catalog.refreshTable(tableIdentWithDB)
Review comment:
We might utilize `HiveThriftServer2Suites` for this - I can check when I
got time.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]