Hi,

The simple way to do this as a map/reduce is the following....

Use the HTable Input and scan the records you want to delete. 
In side Mapper.Setup() create a connection to the HTable where you want to 
delete the records. 
In side Mapper.Map() for each iteration you will get a row which matched your 
scan that you set up in your ToolRunner.  If the record matches the criteria 
that you want to delete, you just issue a delete command passing in that row 
key. 

And voila! You are done. 

No muss, no fuss, and no reducer. 

Its that easy. 

There is no output that you return to your client job except if you maybe want 
to keep count of the records that you deleted and that's an easy thing to do 
using dynamic counters. 

HTH
-Mike

On Jun 20, 2012, at 3:38 AM, Anoop Sam John wrote:

> Hi
>      Do some one tried for the possibility of an Endpoint implementation 
> using which the delete can be done directly with the scan at server side.
> In the below samples I can see
> Client -> Server - Scan for certain rows ( we want the rowkeys satisfying our 
> criteria)
> Client <- Server - returns the Results
> Client -> Server - Delete calls 
> 
> Instead using the Endpoints we can make one call from Client to Server in 
> which both the scan and the delete will happen...
> 
> -Anoop-
> ________________________________________
> From: Oleg Ruchovets [[email protected]]
> Sent: Tuesday, June 19, 2012 9:47 PM
> To: [email protected]
> Subject: Re: delete rows from hbase
> 
> Thank you all for the answers. I try to speed up my solution and user
> map/reduce over hbase
> 
> Here is the code:
> I want to use Delete (map function to delete the row) and I pass the same
> tableName  at TableMapReduceUtil.initTableMapperJob
> and TableMapReduceUtil.initTableReducerJob.
> 
> Question: is it possible to pass Delete as I did in map function?
> 
> 
> 
> 
> public class DeleteRowByCriteria {
>    final static Logger LOG =
> LoggerFactory.getLogger(DeleteRowByCriteria.class);
>    public static class MyMapper extends
> TableMapper<ImmutableBytesWritable, Delete> {
> 
>        public String account;
>        public String lifeDate;
> 
>        @Override
>        public void map(ImmutableBytesWritable row, Result value, Context
> context) throws IOException, InterruptedException {
>            context.write(row, new Delete(row.get()));
>        }
>    }
>    public static void main(String[] args) throws ClassNotFoundException,
> IOException, InterruptedException {
> 
> String tableName = args[0];
> String filterCriteria = args[1];
> 
>        Configuration config = HBaseConfiguration.create();
>        Job job = new Job(config, "DeleteRowByCriteria");
>        job.setJarByClass(DeleteRowByCriteria.class);
> 
>        try {
> 
>            Filter campaignIdFilter = new
> PrefixFilter(Bytes.toBytes(filterCriteria));
>            Scan scan = new Scan();
>            scan.setFilter(campaignIdFilter);
>            scan.setCaching(500);
>            scan.setCacheBlocks(false);
> 
> 
>            TableMapReduceUtil.initTableMapperJob(
>                    tableName,
>                    scan,
>                    MyMapper.class,
>                    null,
>                    null,
>                    job);
> 
> 
>            TableMapReduceUtil.initTableReducerJob(
>                    tableName,
>                    null,
>                    job);
>            job.setNumReduceTasks(0);
> 
>            boolean b = job.waitForCompletion(true);
>            if (!b) {
>                throw new IOException("error with job!");
>            }
> 
>        }catch (Exception e) {
>            LOG.error(e.getMessage(), e);
>        }
>    }
> }
> 
> 
> 
> On Tue, Jun 19, 2012 at 9:26 AM, Kevin O'dell <[email protected]>wrote:
> 
>> Oleg,
>> 
>> Here is some code that we used for deleting all rows with user name
>> foo.  It should be fairly portable to your situation:
>> 
>> import java.io.IOException;
>> 
>> import org.apache.hadoop.conf.Configuration;
>> import org.apache.hadoop.hbase.HBaseConfiguration;
>> import org.apache.hadoop.hbase.client.HTable;
>> import org.apache.hadoop.hbase.client.Result;
>> import org.apache.hadoop.hbase.client.ResultScanner;
>> import org.apache.hadoop.hbase.client.Scan;
>> import org.apache.hadoop.hbase.util.Bytes;
>> 
>> public class HBaseDelete {
>> public static void main(String[] args){
>> Configuration conf = HbaseConfiguration.create();
>> Htable t = new HTable("t");
>> 
>> String user = "foo";
>> 
>> byte[] startRow = Bytes.toBytes(user);
>> byte[] stopRow = Bytes.toBytes(user);
>> stopRow[stopRow.length - 1]++; //'fop'
>> Scan scan = new Scan(start Row, stopRow);
>> ResultScanner sc = t.getScanner(scan);
>> for(Result r : sc) {
>> t.delete(new Delete(r.getRow()));
>> }
>> }
>> }
>> /**
>> * Start row: foo
>> * HBase begins matching this byte, one after another.
>> * End row: foo
>> * HBase stops matching at first match, cause start == stop.
>> * End Row: fo[p] (p being 0 +1)
>> * HBase stops matching at something not "foo"
>> */
>> 
>> 
>> On Tue, Jun 19, 2012 at 6:46 AM, Mohammad Tariq <[email protected]>
>> wrote:
>>> you can use Hbase RowFilter to do that.
>>> 
>>> Regards,
>>>    Mohammad Tariq
>>> 
>>> 
>>> On Tue, Jun 19, 2012 at 1:13 PM, shashwat shriparv
>>> <[email protected]> wrote:
>>>> Try to impliment something like this
>>>> 
>>>> Class RegexStringComparator
>>>> 
>>>> 
>>>> 
>>>> On Tue, Jun 19, 2012 at 5:06 AM, Amitanand Aiyer <[email protected]>
>> wrote:
>>>> 
>>>>> You could set up a scan with the criteria you want (start row, end row,
>>>>> keyonlyfilter etc), and do a delete for
>>>>> The rows you get.
>>>>> 
>>>>> On 6/18/12 3:08 PM, "Oleg Ruchovets" <[email protected]> wrote:
>>>>> 
>>>>>> Hi ,
>>>>>> I need to delete rows from hbase table by criteria.
>>>>>> For example I need to delete all rows started with "12345".
>>>>>> I didn't find a way to set a row prefix for delete operation.
>>>>>> What is the best way ( practice ) to delete  rows by criteria from
>> hbase
>>>>>> table?
>>>>>> 
>>>>>> Thanks in advance.
>>>>>> Oleg.
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> 
>>>> 
>>>> ∞
>>>> Shashwat Shriparv
>> 
>> 
>> 
>> --
>> Kevin O'Dell
>> Customer Operations Engineer, Cloudera

Reply via email to