> Here the files with the same name will be overwritten by the retry attempt 
> and it will guarantee correct result from a successful job.

I think your patch might fix your problem, but it fails silently when two 
processes try to write the same file, which isn't supposed to happen (but 
you'll end up introducing the possibility, without any errors).

The MultipleOutputs should be safe to use without an overwrite, because the 
operations involve a commitPending() -> canCommit()  step, which resolves race 
conditions between the speculated tasks.

Unless you're using the broken S3 committer, I think that cannot happen - if it 
is causing trouble for some reason, you might want to explain and I can help 
with the MR job.

The directory renames happen from Attempt -> Task -> Job, so a failed attempt 
should not be able to get a file into the final output in anyway.

Cheers,
Gopal


Reply via email to