I would enjoy seeing this: " Maybe I should submit this as an HBaseconn topic for a presentation? "
Thanks, Ben On Wed, Feb 29, 2012 at 8:18 AM, Michel Segel <[email protected]>wrote: > There is nothing wrong in writing the output from a reducer to HBase. > > The question you have to ask yourself is why are you using a reducer in > the first place. ;-) > > Look, you have a database. Why do you need a reducer? > > It's a simple question... Right? ;-) > > Look, I apologize for being cryptic. This is one of those philosophical > design questions where you the developer/architect have to figure out the > answer for yourself. Maybe I should submit this as an HBaseconn topic for > a presentation? > > Sort of like how to do an efficient table join in HBase.... > > HTH > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Feb 28, 2012, at 11:16 PM, Jacques <[email protected]> wrote: > > > I see nothing wrong with using the output of the reducer into hbase. > You > > just need to make sure duplicated operations wouldn't cause problems. If > > using tableoutputformat, don't use random seeded keys. If working > straight > > against htable, don't use increment. We do this for some situations and > > either don't care about overwrites or use checkAndPut with a skip option > in > > the application code. > > On Feb 28, 2012 9:40 AM, "Ben Snively" <[email protected]> wrote: > > > >> Is there an assertion that you would never need to run a reducer when > >> writing to the DB? > >> > >> It seems that there are cases when you would not need one, but the > general > >> statement doesn't apply to all use cases. > >> > >> If you were trying to process data where you may have two a map task (or > >> set of map tasks) output the same key, you could have a case where you > >> need to reduce the data for that key prior to insert the result into > hbase. > >> > >> Am I missing something, but to me, that would be the deciding factor. > If > >> the key/values output in the map task are the exact values that need to > be > >> inserted into HBase versus multiple values aggregated together and the > >> results put into the hbase entry? > >> > >> Thanks, > >> Ben > >> > >> > >> On Tue, Feb 28, 2012 at 11:20 AM, Michael Segel > >> <[email protected]>wrote: > >> > >>> The better question is why would you need a reducer? > >>> > >>> That's a bit cryptic, I understand, but you have to ask yourself when > do > >>> you need to use a reducer when you are writing to a database... ;-) > >>> > >>> > >>> Sent from my iPhone > >>> > >>> On Feb 28, 2012, at 10:14 AM, "T Vinod Gupta" <[email protected]> > >>> wrote: > >>> > >>>> Mike, > >>>> I didn't understand - why would I not need reducer in hbase m/r? there > >>> can > >>>> be cases right. > >>>> My use case is very similar to Sujee's blog on frequency counting - > >>>> http://sujee.net/tech/articles/hadoop/hbase-map-reduce-freq-counter/ > >>>> So in the reducer, I can do all the aggregations. Is there a better > >> way? > >>> I > >>>> can think of another way - to use increments in the map job itself. i > >>> have > >>>> to figure out if thats possible though. > >>>> > >>>> thanks > >>>> > >>>> On Tue, Feb 28, 2012 at 7:44 AM, Michel Segel < > >> [email protected] > >>>> wrote: > >>>> > >>>>> Yes you can do it. > >>>>> But why do you have a reducer when running a m/r job against HBase? > >>>>> > >>>>> The trick in writing multiple rows... You do it independently of the > >>>>> output from the map() method. > >>>>> > >>>>> > >>>>> Sent from a remote device. Please excuse any typos... > >>>>> > >>>>> Mike Segel > >>>>> > >>>>> On Feb 28, 2012, at 8:34 AM, T Vinod Gupta <[email protected]> > >>> wrote: > >>>>> > >>>>>> while doing map reduce on hbase tables, is it possible to do > multiple > >>>>> puts > >>>>>> in the reducer? what i want is a way to be able to write multiple > >> rows. > >>>>> if > >>>>>> its not possible, then what are the other alternatives? i mean like > >>>>>> creating a wider table in that case. > >>>>>> > >>>>>> thanks > >>>>> > >>> > >> >
