Re: Hadoop MultiOutputs API Issue

Ashish Paliwal Fri, 23 Dec 2016 01:45:50 -0800

Please share comments on mention issue.

Regards,
Ashish.


On Wed, Dec 21, 2016 at 6:28 PM, Ashish Paliwal <[email protected]>
wrote:

> Hi,
>
> Hadoop Map Reduce version: 2.2.0
>
> We are using MultiOutputs to write mullitple output files from Mapper(No
> reducer). As per requirement, multioutput should write in directory other
> than job's default output directory. So We used below MultiOutput method to
> write in different directory.
>
>  public <K, V> void
> <http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.action/0.2.7/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java#>
> write(String
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/lang/String.java#String>
>  namedOutput, K key, V value,String
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/lang/String.java#String>
>  baseOutputPath)
>
> Now, if any Map task run for longer time, then (cause speculative
> execution enabled), hadoop start parallel task to complete task early. Now,
> both task trying to write in same directory in same file. Second task
> failed with "File already exists issue" and so Job.
>
> After analyzing it founds that, like default context writer, *MultiOutputs
> API does not create any temporary directory*. It directly starts writing
> into output directory. and the reason is FileOutputCommitter used by
> default context writer (and so Application Master) is different
> than MultiOutputs.writer. So in case of MultiOutput, none of the method of
> FileOutputCommitter is get called.
>
> So is it known issue or default behavior? And what is the solution for
> this problem?
>
>
> Regards,
> Ashish.
>

Re: Hadoop MultiOutputs API Issue

Reply via email to