Dear all

Is it possible to access the number of reducer tasks from Crunch (something
equivalent to context.getNumReduceTasks() in Hadoop)?

Context: I'm porting Nutch to Crunch. One operation (in  GeneratorJob.java,
GeneratorMapper.java and GeneratorReducer.java -
https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/crawl/GeneratorReducer.java)
takes the n top urls acccording to a score. If I understand well, "n/num of
reduce tasks" urls are selected for each reduce task (GeneratorReducer,
line 102). If there's a good shuffle, the result is good enough.

Thanks in advance!

Vincent

Reply via email to