turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task URL: https://github.com/apache/spark/pull/26339#issuecomment-610361734 > I may miss something here but can't we just delete the file if it exists at: > > https://github.com/apache/spark/blob/2ac6163a5d04027ef4dbdf7d031cddf9415ed25e/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L109 > > > ? No, if speculation is not enabled, we can delete it directly, because there is no possible to exist two concurrent tasks which has same output file name. Two tasks with same taskId but different attempt id would write a same file. https://github.com/apache/spark/blob/2ac6163a5d04027ef4dbdf7d031cddf9415ed25e/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L140-L146 But when speculation is enabled, if we delete it directly, it would cause this task failed(for hdfs, the exception maybe no lease on this inode, for local file, I can not ensure the data would whether duplicated, for other filesystem ...), then I think it would also launch a new speculation task.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
