> Here the files with the same name will be overwritten by the retry attempt > and it will guarantee correct result from a successful job.
I think your patch might fix your problem, but it fails silently when two processes try to write the same file, which isn't supposed to happen (but you'll end up introducing the possibility, without any errors). The MultipleOutputs should be safe to use without an overwrite, because the operations involve a commitPending() -> canCommit() step, which resolves race conditions between the speculated tasks. Unless you're using the broken S3 committer, I think that cannot happen - if it is causing trouble for some reason, you might want to explain and I can help with the MR job. The directory renames happen from Attempt -> Task -> Job, so a failed attempt should not be able to get a file into the final output in anyway. Cheers, Gopal
