There is no reason to use a reducer in this scenario.  I frequently do map-only 
update jobs. Skipping the reduce step saves a lot of unnecessary work.

Dave

-----Original Message-----
From: Stanley Xu [mailto:[email protected]] 
Sent: Thursday, March 24, 2011 7:37 PM
To: [email protected]
Subject: How could I re-calculate every entries in hbase efficiently through 
mapreduce?

Dear Buddies,

I need to re-calculate the entries in a hbase everyday, like let x = 0.9x
everyday, to make the time has impact on the entry values.

So I write a TableMapper to get the Entry, and recalculate the result, and
use Context.write(key, put) to put the update operation in context, and then
use a IdentityTableReducer to write that directly back the hbase. In order
to make the job done in a short time, I use the HRegionPartitioner to
increase the reducer number to 50.

But I have two doubts here:
1. It looks the partitioner will do a lots of shuffling, I am wondering why
it couldn't just do the put on the local region since the read and write on
the same entry should be on the same region, isn't it?

2. If the job failed for any reason(like timeout), the HBase might be in a
partial-updated status, is it?

Is there any suggestion that I could avoid these two problems?


Thanks.

Best wishes,
Stanley Xu

Reply via email to