Hello, I'd like to use more than one reduce task with Hadoop Streaming and I'd like to have only one result. Is it possible? Or should I run one more job to merge the result? And is it the same with non-streaming jobs? Below you see, I have 5 results for mapred.reduce.tasks=5.
$ hadoop jar /packages/run.64/hadoop-0.20.2-cdh3u1/contrib/streaming/hadoop-streaming-0.20.2-cdh3u1.jar -D mapred.reduce.tasks=5 -mapper /bin/cat -reducer /tmp/wcc -file /tmp/wcc -file /bin/cat -input /user/hadoopnlp/1gb -output 1gb.wc . . . 13/01/03 22:00:03 INFO streaming.StreamJob: map 100% reduce 100% 13/01/03 22:00:07 INFO streaming.StreamJob: Job complete: job_201301021717_0038 13/01/03 22:00:07 INFO streaming.StreamJob: Output: 1gb.wc $ hadoop dfs -cat 1gb.wc/part-* 472173052 165736187 201719914 184376668 163872819 $ where /tmp/wcc contains #!/bin/bash wc -c Thanks for any answer, Pavel Hančar
