turboFei removed a comment on issue #24142: [SPARK-27194][core] Job failures when task attempts do not clean up spark-staging parquet files URL: https://github.com/apache/spark/pull/24142#issuecomment-541376155 Hi, @vanzin @ajithme and @cloud-fan We face this problem too. Sorry for that I did not notice this PR before and I create a new PR https://github.com/apache/spark/pull/26086. How about using this method to name a task file for dynamic partition overwrite only? Now, for a dynamic partition overwrite operation, the filename of a task output is determined by splitId(taskId) and jobId. So, if speculation is enabled, a task would conflict with its relative speculation task. We can make the file name of a task for dynamic partition overwrite be unique. And the outputCommitCoordinator would decide which task can commit. And for dynamic partition overwrite, it keeps a filesToMove set, which would not cause duplicate result.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
