RE: How could I re-calculate every entries in hbase efficiently through mapreduce?

Michael Segel Fri, 25 Mar 2011 08:26:39 -0700


Yeah... 
Uhm I don't know of many use cases where you would want or need a reducer step 
when dealing with HBase.
I'm sure one may exist, but from past practical experience... you shouldn't 
need one.


----------------------------------------
> From: [email protected]
> To: [email protected]
> Date: Fri, 25 Mar 2011 08:20:45 -0700
> Subject: RE: How could I re-calculate every entries in hbase efficiently 
> through mapreduce?
>
> There is no reason to use a reducer in this scenario. I frequently do 
> map-only update jobs. Skipping the reduce step saves a lot of unnecessary 
> work.
>
> Dave
>
> -----Original Message-----
> From: Stanley Xu [mailto:[email protected]]
> Sent: Thursday, March 24, 2011 7:37 PM
> To: [email protected]
> Subject: How could I re-calculate every entries in hbase efficiently through 
> mapreduce?
>
> Dear Buddies,
>
> I need to re-calculate the entries in a hbase everyday, like let x = 0.9x
> everyday, to make the time has impact on the entry values.
>
> So I write a TableMapper to get the Entry, and recalculate the result, and
> use Context.write(key, put) to put the update operation in context, and then
> use a IdentityTableReducer to write that directly back the hbase. In order
> to make the job done in a short time, I use the HRegionPartitioner to
> increase the reducer number to 50.
>
> But I have two doubts here:
> 1. It looks the partitioner will do a lots of shuffling, I am wondering why
> it couldn't just do the put on the local region since the read and write on
> the same entry should be on the same region, isn't it?
>
> 2. If the job failed for any reason(like timeout), the HBase might be in a
> partial-updated status, is it?
>
> Is there any suggestion that I could avoid these two problems?
>
>
> Thanks.
>
> Best wishes,
> Stanley Xu

RE: How could I re-calculate every entries in hbase efficiently through mapreduce?

Reply via email to