Hi, Yes: this is possible. Just configure the 1st MR job's output path as the 2nd ones inputs. There will be identity mappers running -- compared to no mappers -- but they come with Hadoop. They are just a technical neccessity. To avoid this overhead, Tez, Spark, Flink and other execution engines were build to write a DAG and run your algorithms on them. Kind regards, Daniel.
> To: [email protected] > From: [email protected] > Subject: Job that just runs the reduce tasks > Date: Fri, 9 Oct 2015 10:46:49 +0100 > > Hi, > > If we run a job without reduce tasks, the map output is going to be > saved into HDFS. Now, I would like to launch another job that reads the > map output and compute the reduce phase. Is it possible to execute a job > that reads the map output from HDFS and just runs the reduce phase? > > Thanks, >
