[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

GitBox Tue, 07 Apr 2020 18:28:02 -0700

turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix
the issue that for dynamic partition overwrite a task would conflict with its
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-610699108

> If the issue is caused by conflict file name, why it is only specific to
`dynamicPartitionOverwrite`?

Spark use Hadoop's FileOutputCommitter as OutputCommitter class normally.
There is an working directory, for spark, it is `_temporary/0'.
Each task has an unique output, like `_temporary/0/taskAtttempt_0**/'.
Each task also has working directory like
`_temporary/0/taskAttempt_0**/_temporary`.
Each task invoke commitTask to commit its output, for algorithm 1, it would
commit output to `_temporary/0/taskAttempt_**`, and for algorithm 2, it would
commit output to outputPath directly.
See detail in

https://github.com/apache/hadoop/blob/20eec958674a9c343a80c9fccd1383ef7c1b57f5/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java#L573

But for dynamic partition overwrite, Spark set an unique output named
`.spark-staging-${UUID}`under tablePath.
For each task, its output could be
`.spark-staging-${UUID}/partitionPath1/fileName1`,
`.spark-staging-${UUID}/partitionPath2/fileName2` ... and
`.spark-staging-${UUID}/partitionPathn/fileNamen`.

It means that, all tasks share the same working directories.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

Reply via email to