turboFei commented on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task URL: https://github.com/apache/spark/pull/26339#issuecomment-610700346 https://github.com/apache/spark/blob/8ab2a0c5f23a59c00a9b4191afd976af50d913ba/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L104 https://github.com/apache/spark/blob/8ab2a0c5f23a59c00a9b4191afd976af50d913ba/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L127 In fact, there are three cases. - dynamic partition overwrite - withAbsOutputPath - non-dynamic partition overwrite As mentioned above, for non-dynamic partition overwrite, each task has an unique working directory. For the case with abs output path, the task output file name is also unique. https://github.com/apache/spark/blob/8ab2a0c5f23a59c00a9b4191afd976af50d913ba/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L134 So, this is an issue only for dynamic partition overwrite. @Ngone51 @venkata91
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
