Github user zheh12 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21257#discussion_r186599760
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
 ---
    @@ -207,9 +207,25 @@ case class InsertIntoHadoopFsRelationCommand(
         }
         // first clear the path determined by the static partition keys (e.g. 
/table/foo=1)
         val staticPrefixPath = 
qualifiedOutputPath.suffix(staticPartitionPrefix)
    -    if (fs.exists(staticPrefixPath) && !committer.deleteWithJob(fs, 
staticPrefixPath, true)) {
    -      throw new IOException(s"Unable to clear output " +
    -        s"directory $staticPrefixPath prior to writing to it")
    +
    +    // check if delete the dir or just sub files
    +    if (fs.exists(staticPrefixPath)) {
    +      // check if is he table root, and record the file to delete
    +      if (staticPartitionPrefix.isEmpty) {
    +        val files = fs.listFiles(staticPrefixPath, false)
    +        while (files.hasNext) {
    +          val file = files.next()
    +          if (!committer.deleteWithJob(fs, file.getPath, true)) {
    --- End diff --
    
    First of all, if it is the root directory of the table, I must record all 
the files in the directory, and wait until the job is commited to delete. 
Because the `_temporary` of the entire job is also in the directory, I cannot 
directly delete the entire directory.
    
    Second, when we record the files that need to be deleted, we just list the 
files in the root directory non-recursively. Under normal circumstances, the 
number of files in the first-level directory of the partition table will not be 
too much.
    
    In the end, this will certainly be slower than directly deleting the entire 
directory, but under the current implementation, we cannot directly delete the 
entire table directory.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to