Sigh. Dave, I really think you need to think more about the problem.
Think about what a reduce does and then think about what happens in side of HBase. Then think about which runs faster... a job with two mappers writing the intermediate and final results in HBase, or a M/R job that writes its output to HBase. If you really truly think about the problem, you will start to understand why I say you really don't want to use a reducer when you're working w HBase. On May 10, 2012, at 1:41 PM, Dave Revell wrote: > Some examples of when you'd want a reducer: > http://static.usenix.org/event/osdi04/tech/full_papers/dean/dean.pdf > > On Thu, May 10, 2012 at 11:30 AM, Michael Segel > <[email protected]>wrote: > >> Dave, do you really want to go there? >> >> OP has a couple of issues and he was going down a rabbit hole. >> (You can choose if that's a reference to 'the Matrix, Jefferson Starship, >> Alice in Wonderland... or all of the above) >> >> So to put him on the correct path, I recommended the following, not in any >> order... >> >> 1) Increase his region size for this table only. >> 2) Look to decreasing the number of regions managed by a RS (which is why >> you increase region size) >> 3) Up the dfs.balance.bandwidthPerSec. (How often does HBase move regions >> and how exactly do they move regions ?) >> 4) Look at implementing MSLABS and GC tuning. This cuts down on the >> overhead. >> 5) Refactoring his job.... >> >> Oops. >> Ok I didn't put that in the list. >> But that was the last thing I wrote as a separate statement. >> Clearly you didn't take my advice and think about the problem.... >> >> To prove a point.... you wrote: >> 'Many mapreduce algorithms require a reduce phase (e.g. sorting)' >> >> Ok. So tell me why you would want to sort your input in to HBase and if >> that's really a good thing? >> Oops!... :-) >> >> >> >> >> >> >> On May 10, 2012, at 12:31 PM, Dave Revell wrote: >>> This "you don't need a reducer" conversation is distracting from the real >>> problem and is false. >>> >>> Many mapreduce algorithms require a reduce phase (e.g. sorting). The fact >>> that the output is written to HBase or somewhere else is irrelevant. >>> >>> -Dave >>> >>> On Thu, May 10, 2012 at 6:26 AM, Michael Segel < >> [email protected]>wrote: >>> [SNIP] >> >>
