Do you actually care about the number of reducers, or just get top n from a table? The latter is built into the framework.
On Sat, May 2, 2015, 6:12 PM Vincent Fabro <[email protected]> wrote: > Dear all > > Is it possible to access the number of reducer tasks from Crunch > (something equivalent to context.getNumReduceTasks() in Hadoop)? > > Context: I'm porting Nutch to Crunch. One operation (in > GeneratorJob.java, GeneratorMapper.java and GeneratorReducer.java - > https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/crawl/GeneratorReducer.java) > takes the n top urls acccording to a score. If I understand well, "n/num of > reduce tasks" urls are selected for each reduce task (GeneratorReducer, > line 102). If there's a good shuffle, the result is good enough. > > Thanks in advance! > > Vincent >
