Hi, Hadoop Map Reduce version: 2.2.0
We are using MultiOutputs to write mullitple output files from Mapper(No reducer). As per requirement, multioutput should write in directory other than job's default output directory. So We used below MultiOutput method to write in different directory. public <K, V> void <http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.action/0.2.7/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java#> write(String <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/lang/String.java#String> namedOutput, K key, V value,String <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/lang/String.java#String> baseOutputPath) Now, if any Map task run for longer time, then (cause speculative execution enabled), hadoop start parallel task to complete task early. Now, both task trying to write in same directory in same file. Second task failed with "File already exists issue" and so Job. After analyzing it founds that, like default context writer, *MultiOutputs API does not create any temporary directory*. It directly starts writing into output directory. and the reason is FileOutputCommitter used by default context writer (and so Application Master) is different than MultiOutputs.writer. So in case of MultiOutput, none of the method of FileOutputCommitter is get called. So is it known issue or default behavior? And what is the solution for this problem? Regards, Ashish.
