Re: delete rows from hbase

Oleg Ruchovets Wed, 20 Jun 2012 04:56:51 -0700

*
*

Well  , I a bit changed my previous solution , it works but it is very slow
!!!!!!!


I think it is because I pass SINGLE DELETE object  and not LIST of DELETES.

Is it possible to pass List of Deletes thru map instead of single delete?

import org.apache.commons.cli.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.PrefixFilter;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.mapreduce.Job;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;

public class DeleteRowByCriteria {
    final static Logger LOG =
LoggerFactory.getLogger(DeleteRowByCriteria.class);

    public static class MyMapper extends
TableMapper<ImmutableBytesWritable, Delete> {

        @Override
        public void map(ImmutableBytesWritable row, Result value, Context
context) throws IOException, InterruptedException {
            context.getCounter("amobee",
"DeleteRowByCriteria.RowCounter").increment(1);
            context.write(row, new Delete(row.get()));
        }
    }


    public static void main(String[] args) throws ClassNotFoundException,
IOException, InterruptedException {

        Configuration config = HBaseConfiguration.create();
        config.setBoolean("mapred.map.tasks.speculative.execution" , false);
        Job job = new Job(config, "DeleteRowByCriteria");
        job.setJarByClass(DeleteRowByCriteria.class);


        Options options = getOptions();
        try {
            AggregationContext aggregationContext =
getAggregationContext(args, options);
            Filter campaignIdFilter = new
PrefixFilter(Bytes.toBytes(aggregationContext.getCampaignId()));
            Scan scan = new Scan();
            scan.setFilter(campaignIdFilter);
            scan.setCaching(20000);
            scan.setCacheBlocks(false);


            TableMapReduceUtil.initTableMapperJob(
                    aggregationContext.getCmltTableName(),
                    scan,
                    MyMapper.class,
                    null,
                    null,
                    job);

            job.setOutputFormatClass(TableOutputFormat.class);
    job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE,
aggregationContext.getCmltTableName());

            job.setNumReduceTasks(0);

            boolean b = job.waitForCompletion(true);
            if (!b) {
                throw new IOException("error with job!");
            }

        } catch (Exception e) {
            LOG.error(e.getMessage(), e);
        }


    }

}


On Wed, Jun 20, 2012 at 7:41 AM, Michael Segel <[email protected]>wrote:

> Hi,
>
> The simple way to do this as a map/reduce is the following....
>
> Use the HTable Input and scan the records you want to delete.
> In side Mapper.Setup() create a connection to the HTable where you want to
> delete the records.
> In side Mapper.Map() for each iteration you will get a row which matched
> your scan that you set up in your ToolRunner.  If the record matches the
> criteria that you want to delete, you just issue a delete command passing
> in that row key.
>
> And voila! You are done.
>
> No muss, no fuss, and no reducer.
>
> Its that easy.
>
> There is no output that you return to your client job except if you maybe
> want to keep count of the records that you deleted and that's an easy thing
> to do using dynamic counters.
>
> HTH
> -Mike
>
> On Jun 20, 2012, at 3:38 AM, Anoop Sam John wrote:
>
> > Hi
> >      Do some one tried for the possibility of an Endpoint implementation
> using which the delete can be done directly with the scan at server side.
> > In the below samples I can see
> > Client -> Server - Scan for certain rows ( we want the rowkeys
> satisfying our criteria)
> > Client <- Server - returns the Results
> > Client -> Server - Delete calls
> >
> > Instead using the Endpoints we can make one call from Client to Server
> in which both the scan and the delete will happen...
> >
> > -Anoop-
> > ________________________________________
> > From: Oleg Ruchovets [[email protected]]
> > Sent: Tuesday, June 19, 2012 9:47 PM
> > To: [email protected]
> > Subject: Re: delete rows from hbase
> >
> > Thank you all for the answers. I try to speed up my solution and user
> > map/reduce over hbase
> >
> > Here is the code:
> > I want to use Delete (map function to delete the row) and I pass the same
> > tableName  at TableMapReduceUtil.initTableMapperJob
> > and TableMapReduceUtil.initTableReducerJob.
> >
> > Question: is it possible to pass Delete as I did in map function?
> >
> >
> >
> >
> > public class DeleteRowByCriteria {
> >    final static Logger LOG =
> > LoggerFactory.getLogger(DeleteRowByCriteria.class);
> >    public static class MyMapper extends
> > TableMapper<ImmutableBytesWritable, Delete> {
> >
> >        public String account;
> >        public String lifeDate;
> >
> >        @Override
> >        public void map(ImmutableBytesWritable row, Result value, Context
> > context) throws IOException, InterruptedException {
> >            context.write(row, new Delete(row.get()));
> >        }
> >    }
> >    public static void main(String[] args) throws ClassNotFoundException,
> > IOException, InterruptedException {
> >
> > String tableName = args[0];
> > String filterCriteria = args[1];
> >
> >        Configuration config = HBaseConfiguration.create();
> >        Job job = new Job(config, "DeleteRowByCriteria");
> >        job.setJarByClass(DeleteRowByCriteria.class);
> >
> >        try {
> >
> >            Filter campaignIdFilter = new
> > PrefixFilter(Bytes.toBytes(filterCriteria));
> >            Scan scan = new Scan();
> >            scan.setFilter(campaignIdFilter);
> >            scan.setCaching(500);
> >            scan.setCacheBlocks(false);
> >
> >
> >            TableMapReduceUtil.initTableMapperJob(
> >                    tableName,
> >                    scan,
> >                    MyMapper.class,
> >                    null,
> >                    null,
> >                    job);
> >
> >
> >            TableMapReduceUtil.initTableReducerJob(
> >                    tableName,
> >                    null,
> >                    job);
> >            job.setNumReduceTasks(0);
> >
> >            boolean b = job.waitForCompletion(true);
> >            if (!b) {
> >                throw new IOException("error with job!");
> >            }
> >
> >        }catch (Exception e) {
> >            LOG.error(e.getMessage(), e);
> >        }
> >    }
> > }
> >
> >
> >
> > On Tue, Jun 19, 2012 at 9:26 AM, Kevin O'dell <[email protected]
> >wrote:
> >
> >> Oleg,
> >>
> >> Here is some code that we used for deleting all rows with user name
> >> foo.  It should be fairly portable to your situation:
> >>
> >> import java.io.IOException;
> >>
> >> import org.apache.hadoop.conf.Configuration;
> >> import org.apache.hadoop.hbase.HBaseConfiguration;
> >> import org.apache.hadoop.hbase.client.HTable;
> >> import org.apache.hadoop.hbase.client.Result;
> >> import org.apache.hadoop.hbase.client.ResultScanner;
> >> import org.apache.hadoop.hbase.client.Scan;
> >> import org.apache.hadoop.hbase.util.Bytes;
> >>
> >> public class HBaseDelete {
> >> public static void main(String[] args){
> >> Configuration conf = HbaseConfiguration.create();
> >> Htable t = new HTable("t");
> >>
> >> String user = "foo";
> >>
> >> byte[] startRow = Bytes.toBytes(user);
> >> byte[] stopRow = Bytes.toBytes(user);
> >> stopRow[stopRow.length - 1]++; //'fop'
> >> Scan scan = new Scan(start Row, stopRow);
> >> ResultScanner sc = t.getScanner(scan);
> >> for(Result r : sc) {
> >> t.delete(new Delete(r.getRow()));
> >> }
> >> }
> >> }
> >> /**
> >> * Start row: foo
> >> * HBase begins matching this byte, one after another.
> >> * End row: foo
> >> * HBase stops matching at first match, cause start == stop.
> >> * End Row: fo[p] (p being 0 +1)
> >> * HBase stops matching at something not "foo"
> >> */
> >>
> >>
> >> On Tue, Jun 19, 2012 at 6:46 AM, Mohammad Tariq <[email protected]>
> >> wrote:
> >>> you can use Hbase RowFilter to do that.
> >>>
> >>> Regards,
> >>>    Mohammad Tariq
> >>>
> >>>
> >>> On Tue, Jun 19, 2012 at 1:13 PM, shashwat shriparv
> >>> <[email protected]> wrote:
> >>>> Try to impliment something like this
> >>>>
> >>>> Class RegexStringComparator
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Jun 19, 2012 at 5:06 AM, Amitanand Aiyer <[email protected]>
> >> wrote:
> >>>>
> >>>>> You could set up a scan with the criteria you want (start row, end
> row,
> >>>>> keyonlyfilter etc), and do a delete for
> >>>>> The rows you get.
> >>>>>
> >>>>> On 6/18/12 3:08 PM, "Oleg Ruchovets" <[email protected]> wrote:
> >>>>>
> >>>>>> Hi ,
> >>>>>> I need to delete rows from hbase table by criteria.
> >>>>>> For example I need to delete all rows started with "12345".
> >>>>>> I didn't find a way to set a row prefix for delete operation.
> >>>>>> What is the best way ( practice ) to delete  rows by criteria from
> >> hbase
> >>>>>> table?
> >>>>>>
> >>>>>> Thanks in advance.
> >>>>>> Oleg.
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>>
> >>>> ∞
> >>>> Shashwat Shriparv
> >>
> >>
> >>
> >> --
> >> Kevin O'Dell
> >> Customer Operations Engineer, Cloudera
>
>

Re: delete rows from hbase

Reply via email to