WinkerDu commented on a change in pull request #29000:
URL: https://github.com/apache/spark/pull/29000#discussion_r464095281
##########
File path:
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
##########
@@ -41,13 +41,17 @@ import org.apache.spark.mapred.SparkHadoopMapRedUtil
* @param jobId the job's or stage's id
* @param path the job's output path, or null if committer acts as a noop
* @param dynamicPartitionOverwrite If true, Spark will overwrite partition
directories at runtime
- * dynamically, i.e., we first write files
under a staging
- * directory with partition path, e.g.
- * /path/to/staging/a=1/b=1/xxx.parquet. When
committing the job,
- * we first clean up the corresponding
partition directories at
- * destination path, e.g.
/path/to/destination/a=1/b=1, and move
- * files from staging directory to the
corresponding partition
- * directories under destination path.
+ * dynamically, i.e., for speculative tasks,
we first write files
+ * to task attempt paths under a staging
directory, e.g.
+ *
/path/to/staging/.spark-staging-{jobId}/_temporary/
+ *
{appAttemptId}/_temporary/{taskAttemptId}/a=1/b=1/xxx.parquet.
+ * When committing the job, we first move
files from task attempt
Review comment:
Actually it depends on committer algorithm version.
1) For version 1, what this annotation mentions here is working, first
moving happens during job committing.
2) For version 2, commit tasks directly move files to output path of commit
job, e.g.,
`/path/to/output/.spark-staging-{jobId}/{appAttemptId}/_temporary/{taskAttemptId}/a=1/b=1/xxx.parquet`
to `/path/to/output/.spark-staging-{jobId}/a=1/b=1/xxx.parquet`
I' ll put this detail to annotation.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]