RE: How could I re-calculate every entries in hbase efficiently through mapreduce?

Buttler, David Fri, 25 Mar 2011 08:44:42 -0700

We ran across a use-case this week.  During inserts into the table, there was 
one field that was populated from hand-crafted HTML that should only have a 
small range of values (e.g. a primary color).  We wanted to keep a log of all 
of the unique values that were found here, and so the values were the map job 
output and then sorted and counted in the reduce phase.  A handy way for us to 
debug the HTML into a persistent file (we could have just used counters, but 
those disappear after a while unless you manually copy them).


-----Original Message-----
From: Michael Segel [mailto:[email protected]] 
Sent: Friday, March 25, 2011 8:26 AM
To: [email protected]
Subject: RE: How could I re-calculate every entries in hbase efficiently 
through mapreduce?



Yeah... 
Uhm I don't know of many use cases where you would want or need a reducer step 
when dealing with HBase.
I'm sure one may exist, but from past practical experience... you shouldn't 
need one.

----------------------------------------
> From: [email protected]
> To: [email protected]
> Date: Fri, 25 Mar 2011 08:20:45 -0700
> Subject: RE: How could I re-calculate every entries in hbase efficiently 
> through mapreduce?
>
> There is no reason to use a reducer in this scenario. I frequently do 
> map-only update jobs. Skipping the reduce step saves a lot of unnecessary 
> work.
>
> Dave
>
> -----Original Message-----
> From: Stanley Xu [mailto:[email protected]]
> Sent: Thursday, March 24, 2011 7:37 PM
> To: [email protected]
> Subject: How could I re-calculate every entries in hbase efficiently through 
> mapreduce?
>
> Dear Buddies,
>
> I need to re-calculate the entries in a hbase everyday, like let x = 0.9x
> everyday, to make the time has impact on the entry values.
>
> So I write a TableMapper to get the Entry, and recalculate the result, and
> use Context.write(key, put) to put the update operation in context, and then
> use a IdentityTableReducer to write that directly back the hbase. In order
> to make the job done in a short time, I use the HRegionPartitioner to
> increase the reducer number to 50.
>
> But I have two doubts here:
> 1. It looks the partitioner will do a lots of shuffling, I am wondering why
> it couldn't just do the put on the local region since the read and write on
> the same entry should be on the same region, isn't it?
>
> 2. If the job failed for any reason(like timeout), the HBase might be in a
> partial-updated status, is it?
>
> Is there any suggestion that I could avoid these two problems?
>
>
> Thanks.
>
> Best wishes,
> Stanley Xu

RE: How could I re-calculate every entries in hbase efficiently through mapreduce?

Reply via email to