Hi there- You probably want to see this...
http://hbase.apache.org/book.html#perf.deleting .. that particular method doesn't use the write-buffer and is submitting deletes one-by-one to the RS's. On 2/21/12 3:52 PM, "Haijia Zhou" <[email protected]> wrote: >Hi, All >I'm new to this email list and hope I can get help from here. >My task is to come up with a M/R job in hbase to scan the whole table, >find >out some data and delete them (delete the whole row), this job will be >executed on a daily basis. >Basically I have mapper class whose map() looks like follows: >public void map(ImmutableBytesWritable row, Result columns, > Context context) >{ > ... do some check > byte[] row = ... > if(needs to delete user){ > Delete delete = new Delete(row); > table.delete(delete) > } > >There's no reducer needed for this task. > >Now, we are observing that this job takes a long time to finish (around >3-4 >hours) for 49,565,000 delete operations and 191,838,114 total records >across 7 region servers >We know that a full table scan on the corresponding column/column family >takes around 40 minutes, so all the rest time were for the delete >operation. > >I wonder if there's anyway or tool to profile the hadoop M/R job ? > >Thanks > >Haijia
