Some examples of when you'd want a reducer: http://static.usenix.org/event/osdi04/tech/full_papers/dean/dean.pdf
On Thu, May 10, 2012 at 11:30 AM, Michael Segel <[email protected]>wrote: > Dave, do you really want to go there? > > OP has a couple of issues and he was going down a rabbit hole. > (You can choose if that's a reference to 'the Matrix, Jefferson Starship, > Alice in Wonderland... or all of the above) > > So to put him on the correct path, I recommended the following, not in any > order... > > 1) Increase his region size for this table only. > 2) Look to decreasing the number of regions managed by a RS (which is why > you increase region size) > 3) Up the dfs.balance.bandwidthPerSec. (How often does HBase move regions > and how exactly do they move regions ?) > 4) Look at implementing MSLABS and GC tuning. This cuts down on the > overhead. > 5) Refactoring his job.... > > Oops. > Ok I didn't put that in the list. > But that was the last thing I wrote as a separate statement. > Clearly you didn't take my advice and think about the problem.... > > To prove a point.... you wrote: > 'Many mapreduce algorithms require a reduce phase (e.g. sorting)' > > Ok. So tell me why you would want to sort your input in to HBase and if > that's really a good thing? > Oops!... :-) > > > > > > > On May 10, 2012, at 12:31 PM, Dave Revell wrote: > > This "you don't need a reducer" conversation is distracting from the real > > problem and is false. > > > > Many mapreduce algorithms require a reduce phase (e.g. sorting). The fact > > that the output is written to HBase or somewhere else is irrelevant. > > > > -Dave > > > > On Thu, May 10, 2012 at 6:26 AM, Michael Segel < > [email protected]>wrote: > > [SNIP] > >
