RE: Job that just runs the reduce tasks

Daniel Schulz Fri, 09 Oct 2015 05:33:07 -0700

Hi,
Yes: this is possible. Just configure the 1st MR job's output path as the 2nd 
ones inputs. There will be identity mappers running -- compared to no mappers 
-- but they come with Hadoop. They are just a technical neccessity.
To avoid this overhead, Tez, Spark, Flink and other execution engines were 
build to write a DAG and run your algorithms on them.
Kind regards, Daniel.


> To: [email protected]
> From: [email protected]
> Subject: Job that just runs the reduce tasks
> Date: Fri, 9 Oct 2015 10:46:49 +0100
> 
> Hi,
> 
> If we run a job without reduce tasks, the map output is going to be 
> saved into HDFS. Now, I would like to launch another job that reads the 
> map output and compute the reduce phase. Is it possible to execute a job 
> that reads the map output from HDFS and just runs the reduce phase?
> 
> Thanks,
>

RE: Job that just runs the reduce tasks

Reply via email to