Hi
The deletes hits back the region you scan from so I wonder if this can't
create hotspots if many rows need to be deleted from a single region.
Can you check that?
Daniel
On 02/21/2012 10:52 PM, Haijia Zhou wrote:
Hi, All
I'm new to this email list and hope I can get help from here.
My task is to come up with a M/R job in hbase to scan the whole table, find
out some data and delete them (delete the whole row), this job will be
executed on a daily basis.
Basically I have mapper class whose map() looks like follows:
public void map(ImmutableBytesWritable row, Result columns,
Context context)
{
... do some check
byte[] row = ...
if(needs to delete user){
Delete delete = new Delete(row);
table.delete(delete)
}
There's no reducer needed for this task.
Now, we are observing that this job takes a long time to finish (around 3-4
hours) for 49,565,000 delete operations and 191,838,114 total records
across 7 region servers
We know that a full table scan on the corresponding column/column family
takes around 40 minutes, so all the rest time were for the delete operation.
I wonder if there's anyway or tool to profile the hadoop M/R job ?
Thanks
Haijia
--
Daniel Iancu
Java Developer,Big Data Solutions Romania
1&1 Internet Development srl.
18 Mircea Eliade St
Sect 1, Bucharest
RO Bucharest, 012015
www.1and1.ro
Phone:+40-031-223-9081