Jeremy, I suppose thats doable, please file a MAPREDUCE JIRA so you can discuss this with others on the development side as well.
I am guessing that MAX operations of most of the user-oriented data flow front-ends such as Hive and Pig already do this efficiently, so perhaps there hasn't been a very strong need for this. On Fri, Oct 5, 2012 at 9:18 PM, Jeremy Lewi <[email protected]> wrote: > HI Harsh, > > Thank you very much that will work. > > How come we can't simply create a modification of a regular mapreduce > counter which does this behind the scenes? It seems like we should just be > able to replace "+" with "max" and everything else should work? > > J > > > On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <[email protected]> wrote: >> >> Jeremy, >> >> Here's my shot at it (pardon the quick crappy code): >> https://gist.github.com/3828246 >> >> Basically - you can achieve it in two ways: >> >> Requirement: All tasks must increment the "max" designated counter >> only AFTER the max has been computed (i.e. in cleanup). >> >> 1. All tasks may use same counter name. Later, we pull per-task >> counters and determine the max at the client. (This is my quick and >> dirty implementation) >> 2. All tasks may use their own task ID (Number part) in the counter >> name, but use the same group. Later, we fetch all counters for that >> group and iterate over it to find the max. This is cleaner, and >> doesn't end up using deprecated APIs such as the above. >> >> Does this help? >> >> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <[email protected]> wrote: >> > HI hadoop-users, >> > >> > I'm curious if there is an implementation somewhere of a counter which >> > tracks the maximum of some value across all mappers or reducers? >> > >> > Thanks >> > J >> >> >> >> -- >> Harsh J > > -- Harsh J
