Please share comments on mention issue. Regards, Ashish.
On Wed, Dec 21, 2016 at 6:28 PM, Ashish Paliwal <[email protected]> wrote: > Hi, > > Hadoop Map Reduce version: 2.2.0 > > We are using MultiOutputs to write mullitple output files from Mapper(No > reducer). As per requirement, multioutput should write in directory other > than job's default output directory. So We used below MultiOutput method to > write in different directory. > > public <K, V> void > <http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.action/0.2.7/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java#> > write(String > <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/lang/String.java#String> > namedOutput, K key, V value,String > <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/lang/String.java#String> > baseOutputPath) > > Now, if any Map task run for longer time, then (cause speculative > execution enabled), hadoop start parallel task to complete task early. Now, > both task trying to write in same directory in same file. Second task > failed with "File already exists issue" and so Job. > > After analyzing it founds that, like default context writer, *MultiOutputs > API does not create any temporary directory*. It directly starts writing > into output directory. and the reason is FileOutputCommitter used by > default context writer (and so Application Master) is different > than MultiOutputs.writer. So in case of MultiOutput, none of the method of > FileOutputCommitter is get called. > > So is it known issue or default behavior? And what is the solution for > this problem? > > > Regards, > Ashish. >
