Do you actually care about the number of reducers, or just get top n from a
table?  The latter is built into the framework.

On Sat, May 2, 2015, 6:12 PM Vincent Fabro <[email protected]>
wrote:

> Dear all
>
> Is it possible to access the number of reducer tasks from Crunch
> (something equivalent to context.getNumReduceTasks() in Hadoop)?
>
> Context: I'm porting Nutch to Crunch. One operation (in
> GeneratorJob.java, GeneratorMapper.java and GeneratorReducer.java -
> https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/crawl/GeneratorReducer.java)
> takes the n top urls acccording to a score. If I understand well, "n/num of
> reduce tasks" urls are selected for each reduce task (GeneratorReducer,
> line 102). If there's a good shuffle, the result is good enough.
>
> Thanks in advance!
>
> Vincent
>

Reply via email to