[GitHub] [spark] sunchao commented on a change in pull request #31308: [SPARK-34215][SQL] Keep tables cached after truncation

GitBox Mon, 25 Jan 2021 19:57:34 -0800


sunchao commented on a change in pull request #31308:
URL: https://github.com/apache/spark/pull/31308#discussion_r563346341




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
##########
@@ -561,16 +561,9 @@ case class TruncateTableCommand(
         }
       }
     }
-    // After deleting the data, invalidate the table to make sure we don't 
keep around a stale
-    // file relation in the metastore cache.
-    spark.sessionState.refreshTable(tableName.unquotedString)
-    // Also try to drop the contents of the table from the columnar cache
-    try {
-      
spark.sharedState.cacheManager.uncacheQuery(spark.table(table.identifier), 
cascade = true)
-    } catch {
-      case NonFatal(e) =>
-        log.warn(s"Exception when attempting to uncache table 
$tableIdentWithDB", e)
-    }
+    // After deleting the data, refresh the table to make sure we don't keep 
around a stale
+    // file relation in the metastore cache and cached table data in the cache 
manager.
+    spark.catalog.refreshTable(tableIdentWithDB)

Review comment:
       Hmm... Not sure if we should catch any non-fatal exception like the 
previous one, since otherwise we'd skip updating stats?

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
##########
@@ -561,16 +561,9 @@ case class TruncateTableCommand(
         }
       }
     }
-    // After deleting the data, invalidate the table to make sure we don't 
keep around a stale
-    // file relation in the metastore cache.
-    spark.sessionState.refreshTable(tableName.unquotedString)
-    // Also try to drop the contents of the table from the columnar cache
-    try {
-      
spark.sharedState.cacheManager.uncacheQuery(spark.table(table.identifier), 
cascade = true)
-    } catch {
-      case NonFatal(e) =>
-        log.warn(s"Exception when attempting to uncache table 
$tableIdentWithDB", e)
-    }
+    // After deleting the data, refresh the table to make sure we don't keep 
around a stale
+    // file relation in the metastore cache and cached table data in the cache 
manager.
+    spark.catalog.refreshTable(tableIdentWithDB)

Review comment:
       I think someone could drop a table or permanent view in a different 
session, or drop a Hive table through beeline or HMS API. This may cause some 
cache which depend on them AND this truncated table to become invalid, and 
potentially analysis exception when recaching them. I haven't got a chance to 
verify this though.
   
   I feel overall it will be a good practice to recover from unknown errors 
here and continue. Also `DropTableCommand` does this as well.

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
##########
@@ -561,16 +561,9 @@ case class TruncateTableCommand(
         }
       }
     }
-    // After deleting the data, invalidate the table to make sure we don't 
keep around a stale
-    // file relation in the metastore cache.
-    spark.sessionState.refreshTable(tableName.unquotedString)
-    // Also try to drop the contents of the table from the columnar cache
-    try {
-      
spark.sharedState.cacheManager.uncacheQuery(spark.table(table.identifier), 
cascade = true)
-    } catch {
-      case NonFatal(e) =>
-        log.warn(s"Exception when attempting to uncache table 
$tableIdentWithDB", e)
-    }
+    // After deleting the data, refresh the table to make sure we don't keep 
around a stale
+    // file relation in the metastore cache and cached table data in the cache 
manager.
+    spark.catalog.refreshTable(tableIdentWithDB)

Review comment:
       I think someone could drop a table or permanent view in a different 
session, or drop a Hive table through beeline or HMS API. This may cause some 
cache which depend on them AND this truncated table to become invalid, and 
potentially analysis exception when recaching them. I haven't got a chance to 
verify this though.
   
   I feel overall it will be a good practice to recover from unknown errors 
here and continue. `DropTableCommand` does this as well.

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
##########
@@ -561,16 +561,9 @@ case class TruncateTableCommand(
         }
       }
     }
-    // After deleting the data, invalidate the table to make sure we don't 
keep around a stale
-    // file relation in the metastore cache.
-    spark.sessionState.refreshTable(tableName.unquotedString)
-    // Also try to drop the contents of the table from the columnar cache
-    try {
-      
spark.sharedState.cacheManager.uncacheQuery(spark.table(table.identifier), 
cascade = true)
-    } catch {
-      case NonFatal(e) =>
-        log.warn(s"Exception when attempting to uncache table 
$tableIdentWithDB", e)
-    }
+    // After deleting the data, refresh the table to make sure we don't keep 
around a stale
+    // file relation in the metastore cache and cached table data in the cache 
manager.
+    spark.catalog.refreshTable(tableIdentWithDB)

Review comment:
       Personally I'd keep just the try-catch logic here because I think the 
above do happens and we shouldn't skip updating stats in the case. But I don't 
really have strong opinion on this.

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
##########
@@ -561,16 +561,9 @@ case class TruncateTableCommand(
         }
       }
     }
-    // After deleting the data, invalidate the table to make sure we don't 
keep around a stale
-    // file relation in the metastore cache.
-    spark.sessionState.refreshTable(tableName.unquotedString)
-    // Also try to drop the contents of the table from the columnar cache
-    try {
-      
spark.sharedState.cacheManager.uncacheQuery(spark.table(table.identifier), 
cascade = true)
-    } catch {
-      case NonFatal(e) =>
-        log.warn(s"Exception when attempting to uncache table 
$tableIdentWithDB", e)
-    }
+    // After deleting the data, refresh the table to make sure we don't keep 
around a stale
+    // file relation in the metastore cache and cached table data in the cache 
manager.
+    spark.catalog.refreshTable(tableIdentWithDB)

Review comment:
       We might utilize `HiveThriftServer2Suites` for this - I can check when I 
got time.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on a change in pull request #31308: [SPARK-34215][SQL] Keep tables cached after truncation

Reply via email to