Re: Access number of reducer tasks from Crunch

Vincent Fabro Sun, 03 May 2015 17:20:06 -0700

Ok, I missed Aggregate.top() (guess my research wasn't thorough).
I'll go with the framework's built-in function, seem cleaner than using
Context.


Thanks a lot for your answers!

Vincent

On Sun, May 3, 2015 at 8:11 AM, Josh Wills <[email protected]> wrote:

> Hey Vincent,
>
> Yeah, you can get at it. Each DoFn inherits a protected getContext()
> method that has the getNumReduceTasks() method defined on it, just like it
> does in the Nutch code you cited. We try (with varying degrees of success)
> to make the underlying MR framework as accessible as possible.
>
> J
>
> On Sun, May 3, 2015 at 2:16 AM, David Ortiz <[email protected]> wrote:
>
>> Do you actually care about the number of reducers, or just get top n from
>> a table?  The latter is built into the framework.
>>
>> On Sat, May 2, 2015, 6:12 PM Vincent Fabro <[email protected]>
>> wrote:
>>
>>> Dear all
>>>
>>> Is it possible to access the number of reducer tasks from Crunch
>>> (something equivalent to context.getNumReduceTasks() in Hadoop)?
>>>
>>> Context: I'm porting Nutch to Crunch. One operation (in
>>> GeneratorJob.java, GeneratorMapper.java and GeneratorReducer.java -
>>> https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/crawl/GeneratorReducer.java)
>>> takes the n top urls acccording to a score. If I understand well, "n/num of
>>> reduce tasks" urls are selected for each reduce task (GeneratorReducer,
>>> line 102). If there's a good shuffle, the result is good enough.
>>>
>>> Thanks in advance!
>>>
>>> Vincent
>>>
>>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Re: Access number of reducer tasks from Crunch

Reply via email to