Adding a combiner step first then reduce? 

On Feb 8, 2013, at 11:18 PM, Harsh J <[email protected]> wrote:

> Hey David,
> 
> There's no readily available way to do this today (you may be
> interested in MAPREDUCE-199 though) but if your Job scheduler's not
> doing multiple-assignments on reduce tasks, then only one is assigned
> per TT heartbeat, which gives you almost what you're looking for: 1
> reduce task per node, round-robin'd (roughly).
> 
> On Sat, Feb 9, 2013 at 9:24 AM, David Parks <[email protected]> wrote:
>> I have a cluster of boxes with 3 reducers per node. I want to limit a
>> particular job to only run 1 reducer per node.
>> 
>> 
>> 
>> This job is network IO bound, gathering images from a set of webservers.
>> 
>> 
>> 
>> My job has certain parameters set to meet “web politeness” standards (e.g.
>> limit connects and connection frequency).
>> 
>> 
>> 
>> If this job runs from multiple reducers on the same node, those per-host
>> limits will be violated.  Also, this is a shared environment and I don’t
>> want long running network bound jobs uselessly taking up all reduce slots.
> 
> 
> 
> --
> Harsh J
> 

Michael Segel  | (m) 312.755.9623

Segel and Associates


Reply via email to