Hi, All
I'm new to this email list and hope I can get help from here.
My task is to come up with a M/R job in hbase to scan the whole table, find
out some data and delete them (delete the whole row), this job will be
executed on a daily basis.
Basically I have mapper class whose map() looks like follows:
public void map(ImmutableBytesWritable row, Result columns,
Context context)
{
... do some check
byte[] row = ...
if(needs to delete user){
Delete delete = new Delete(row);
table.delete(delete)
}
There's no reducer needed for this task.
Now, we are observing that this job takes a long time to finish (around 3-4
hours) for 49,565,000 delete operations and 191,838,114 total records
across 7 region servers
We know that a full table scan on the corresponding column/column family
takes around 40 minutes, so all the rest time were for the delete operation.
I wonder if there's anyway or tool to profile the hadoop M/R job ?
Thanks
Haijia