turboFei commented on issue #26086: [SPARK-29302] Make the file name of a task for dynamic partition overwrite be unique URL: https://github.com/apache/spark/pull/26086#issuecomment-546703904 @srowen Thanks for your reply. I think the risks are: - For dynamicPartitionOverwrite, before this PR, a task's filename would conflict with its speculation name. - For the case that non-dynamicPartitionOverwrite and non-FileoutputCommitter, if a task's filename if not same with its attempt-task/speculation task, if a task abort without clean up output gracefully, it would cause duplicate result. So, in this PR, I only name a task file with taskId and attemptId only for dynamicPartitionOverwrite. But for the above non-dynamicPartitionOverwrite and non-FileoutputCommitter case, a task's filename also would conflict with its speculation task. https://github.com/apache/spark/blob/077fb99a26a9e92104503fade25c0a095fec5e5d/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L104-L125
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
