[GitHub] spark pull request #20931: [SPARK-23815][Core]Spark writer dynamic partition...

fangshil Thu, 05 Apr 2018 09:08:26 -0700

Github user fangshil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20931#discussion_r179517200
  
    --- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
    @@ -186,7 +186,9 @@ class HadoopMapReduceCommitProtocol(
             logDebug(s"Clean up default partition directories for overwriting: 
$partitionPaths")
             for (part <- partitionPaths) {
               val finalPartPath = new Path(path, part)
    -          fs.delete(finalPartPath, true)
    +          if (!fs.delete(finalPartPath, true) && 
!fs.exists(finalPartPath.getParent)) {
    --- End diff --
    
    @cloud-fan this is to follow the behavior of HDFS rename spec: it requires 
the parent to be present. If we create finalPartPath directly, then it will 
cause another wired behavior in rename when the dst path already exists. From 
the HDFS spec I shared above: " If the destination exists and is a directory, 
the final destination of the rename becomes the destination + the filename of 
the source path".  We have confirmed this in our production cluster, and 
resulted in the current solution to only create parent dir which follows the 
HDFS spec exactly,



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20931: [SPARK-23815][Core]Spark writer dynamic partition...

Reply via email to