turboFei edited a comment on issue #26086: [SPARK-29302] Make the file name of a task for dynamic partition overwrite be unique URL: https://github.com/apache/spark/pull/26086#issuecomment-546703904 @srowen Thanks for your reply. I think the risks are: - For dynamicPartitionOverwrite, before this PR, a task's filename would conflict with its speculation name. - For the case that non-dynamicPartitionOverwrite and non-FileoutputCommitter, if a task's filename if not same with its attempt-task/speculation task, if a task abort without clean up output gracefully, it would cause duplicate result. So, in this PR, I only name a task file with taskId and attemptId only for dynamicPartitionOverwrite. But for the above non-dynamicPartitionOverwrite and non-FileoutputCommitter case, a task's filename also would conflict with its speculation task. As shown below, before this PR, there are risks for dynamicPartitionOverwrite and non-FileOutputCommitter. And this PR fix the issue for dynamicPartitionOverwrite case. In fact, there are rarely cases for non-FileOutputCommitter. https://github.com/apache/spark/blob/077fb99a26a9e92104503fade25c0a095fec5e5d/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L104-L125
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
