[GitHub] [spark] sunchao commented on a change in pull request #29552: [SPARK-32481][CORE][SQL] Support truncate table to move data to trash

GitBox Wed, 26 Aug 2020 12:25:13 -0700


sunchao commented on a change in pull request #29552:
URL: https://github.com/apache/spark/pull/29552#discussion_r477533631




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2722,6 +2722,17 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val TRUNCATE_TRASH_ENABLED =
+    buildConf("spark.sql.truncate.trash.enabled")
+      .doc("This configuration decides when truncating table, whether data 
files will be moved " +
+        "to trash directory or deleted permanently. The trash retention time 
is controlled by " +
+        "fs.trash.interval, and in default, the server side configuration 
value takes " +
+        "precedence over the client-side one. Note that if fs.trash.interval 
is non-positive, " +
+        "this will be a no-op and log a warning message.")

Review comment:
       this description needs to be updated

##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
##########
@@ -3101,6 +3101,81 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
       assert(spark.sessionState.catalog.isRegisteredFunction(rand))
     }
   }
+
+  test("SPARK-32481 Move data to trash on truncate table if enabled") {
+    val trashIntervalKey = "fs.trash.interval"
+    withTable("tab1") {
+      withSQLConf(SQLConf.TRUNCATE_TRASH_ENABLED.key -> "true") {
+        sql("CREATE TABLE tab1 (col INT) USING parquet")
+        sql("INSERT INTO tab1 SELECT 1")
+        // scalastyle:off hadoopconfiguration
+        val hadoopConf = spark.sparkContext.hadoopConfiguration
+        // scalastyle:on hadoopconfiguration
+        val originalValue = hadoopConf.get(trashIntervalKey, "0")
+        val tablePath = new Path(spark.sessionState.catalog
+          .getTableMetadata(TableIdentifier("tab1")).storage.locationUri.get)
+
+        val fs = tablePath.getFileSystem(hadoopConf)
+        val trashCurrent = new Path(fs.getHomeDirectory, ".Trash/Current")

Review comment:
       can we use `Trash.getCurrentTrashDir` instead? semantically it is the 
same but hides details such as homedir and others.

##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -269,6 +269,27 @@ private[spark] object Utils extends Logging {
     file.setExecutable(true, true)
   }
 
+  /**
+   * Move data to trash if 'spark.sql.truncate.trash.enabled' is true
+   */
+  def moveToTrashIfEnabled(
+      fs: FileSystem,
+      partitionPath: Path,
+      isTrashEnabled: Boolean,
+      hadoopConf: Configuration): Boolean = {
+    if (isTrashEnabled) {
+      logDebug(s"will move data ${partitionPath.toString} to trash")
+      val isSuccess = Trash.moveToAppropriateTrash(fs, partitionPath, 
hadoopConf)
+      if (!isSuccess) {
+        logWarning(s"Failed to move data ${partitionPath.toString} to trash")

Review comment:
       nit: can we change this to something like "Failed to move data 
${partitionPath.toString} to trash. Fallback to hard deletion"? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on a change in pull request #29552: [SPARK-32481][CORE][SQL] Support truncate table to move data to trash

Reply via email to