RE: How could I re-calculate every entries in hbase efficiently through mapreduce?

Michael Segel Fri, 25 Mar 2011 13:16:58 -0700

Well there goes my weekend. :-P
----------------------------------------
> From: [email protected]
> To: [email protected]
> Date: Fri, 25 Mar 2011 10:00:26 -0700
> Subject: RE: How could I re-calculate every entries in hbase efficiently 
> through mapreduce?
>
> I would certainly find it useful if you wrote such a blog post.
> Dave
>
> -----Original Message-----
> From: Michael Segel [mailto:[email protected]]
> Sent: Friday, March 25, 2011 8:55 AM
> To: [email protected]
> Subject: RE: How could I re-calculate every entries in hbase efficiently 
> through mapreduce?
>
>
> "During inserts into the table, there was one field that was populated
> from hand-crafted HTML that should only have a small range of values
> (e.g. a primary color). We wanted to keep a log of all of the unique
> values that were found here, and so the values were the map job output
> and then sorted and counted in the reduce phase."
>
> Ahhh, have you heard about dynamic counters?
> You don't need a reducer and all you have to do is dump the counters in your 
> main job after your mappers run.
>
> Maybe I should write a blog entry where you can do your word counter app 
> using just dynamic counters and no reducers?
>
> HTH
>
> -Mike
>
>
> ----------------------------------------
> > From: [email protected]
> > To: [email protected]
> > Date: Fri, 25 Mar 2011 08:44:12 -0700
> > Subject: RE: How could I re-calculate every entries in hbase efficiently 
> > through mapreduce?
> >
> > We ran across a use-case this week. During inserts into the table, there 
> > was one field that was populated from hand-crafted HTML that should only 
> > have a small range of values (e.g. a primary color). We wanted to keep a 
> > log of all of the unique values that were found here, and so the values 
> > were the map job output and then sorted and counted in the reduce phase. A 
> > handy way for us to debug the HTML into a persistent file (we could have 
> > just used counters, but those disappear after a while unless you manually 
> > copy them).
> >
> > -----Original Message-----
> > From: Michael Segel [mailto:[email protected]]
> > Sent: Friday, March 25, 2011 8:26 AM
> > To: [email protected]
> > Subject: RE: How could I re-calculate every entries in hbase efficiently 
> > through mapreduce?
> >
> >
> >
> > Yeah...
> > Uhm I don't know of many use cases where you would want or need a reducer 
> > step when dealing with HBase.
> > I'm sure one may exist, but from past practical experience... you shouldn't 
> > need one.
> >
> > ----------------------------------------
> > > From: [email protected]
> > > To: [email protected]
> > > Date: Fri, 25 Mar 2011 08:20:45 -0700
> > > Subject: RE: How could I re-calculate every entries in hbase efficiently 
> > > through mapreduce?
> > >
> > > There is no reason to use a reducer in this scenario. I frequently do 
> > > map-only update jobs. Skipping the reduce step saves a lot of unnecessary 
> > > work.
> > >
> > > Dave
> > >
> > > -----Original Message-----
> > > From: Stanley Xu [mailto:[email protected]]
> > > Sent: Thursday, March 24, 2011 7:37 PM
> > > To: [email protected]
> > > Subject: How could I re-calculate every entries in hbase efficiently 
> > > through mapreduce?
> > >
> > > Dear Buddies,
> > >
> > > I need to re-calculate the entries in a hbase everyday, like let x = 0.9x
> > > everyday, to make the time has impact on the entry values.
> > >
> > > So I write a TableMapper to get the Entry, and recalculate the result, and
> > > use Context.write(key, put) to put the update operation in context, and 
> > > then
> > > use a IdentityTableReducer to write that directly back the hbase. In order
> > > to make the job done in a short time, I use the HRegionPartitioner to
> > > increase the reducer number to 50.
> > >
> > > But I have two doubts here:
> > > 1. It looks the partitioner will do a lots of shuffling, I am wondering 
> > > why
> > > it couldn't just do the put on the local region since the read and write 
> > > on
> > > the same entry should be on the same region, isn't it?
> > >
> > > 2. If the job failed for any reason(like timeout), the HBase might be in a
> > > partial-updated status, is it?
> > >
> > > Is there any suggestion that I could avoid these two problems?
> > >
> > >
> > > Thanks.
> > >
> > > Best wishes,
> > > Stanley Xu
> >
>
RE: How could I re-calculate every entries in hbase efficiently through mapreduce?

Reply via email to