*
*
Well , I a bit changed my previous solution , it works but it is very slow
!!!!!!!
I think it is because I pass SINGLE DELETE object and not LIST of DELETES.
Is it possible to pass List of Deletes thru map instead of single delete?
import org.apache.commons.cli.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.PrefixFilter;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.mapreduce.Job;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
public class DeleteRowByCriteria {
final static Logger LOG =
LoggerFactory.getLogger(DeleteRowByCriteria.class);
public static class MyMapper extends
TableMapper<ImmutableBytesWritable, Delete> {
@Override
public void map(ImmutableBytesWritable row, Result value, Context
context) throws IOException, InterruptedException {
context.getCounter("amobee",
"DeleteRowByCriteria.RowCounter").increment(1);
context.write(row, new Delete(row.get()));
}
}
public static void main(String[] args) throws ClassNotFoundException,
IOException, InterruptedException {
Configuration config = HBaseConfiguration.create();
config.setBoolean("mapred.map.tasks.speculative.execution" , false);
Job job = new Job(config, "DeleteRowByCriteria");
job.setJarByClass(DeleteRowByCriteria.class);
Options options = getOptions();
try {
AggregationContext aggregationContext =
getAggregationContext(args, options);
Filter campaignIdFilter = new
PrefixFilter(Bytes.toBytes(aggregationContext.getCampaignId()));
Scan scan = new Scan();
scan.setFilter(campaignIdFilter);
scan.setCaching(20000);
scan.setCacheBlocks(false);
TableMapReduceUtil.initTableMapperJob(
aggregationContext.getCmltTableName(),
scan,
MyMapper.class,
null,
null,
job);
job.setOutputFormatClass(TableOutputFormat.class);
job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE,
aggregationContext.getCmltTableName());
job.setNumReduceTasks(0);
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
} catch (Exception e) {
LOG.error(e.getMessage(), e);
}
}
}
On Wed, Jun 20, 2012 at 7:41 AM, Michael Segel <[email protected]>wrote:
> Hi,
>
> The simple way to do this as a map/reduce is the following....
>
> Use the HTable Input and scan the records you want to delete.
> In side Mapper.Setup() create a connection to the HTable where you want to
> delete the records.
> In side Mapper.Map() for each iteration you will get a row which matched
> your scan that you set up in your ToolRunner. If the record matches the
> criteria that you want to delete, you just issue a delete command passing
> in that row key.
>
> And voila! You are done.
>
> No muss, no fuss, and no reducer.
>
> Its that easy.
>
> There is no output that you return to your client job except if you maybe
> want to keep count of the records that you deleted and that's an easy thing
> to do using dynamic counters.
>
> HTH
> -Mike
>
> On Jun 20, 2012, at 3:38 AM, Anoop Sam John wrote:
>
> > Hi
> > Do some one tried for the possibility of an Endpoint implementation
> using which the delete can be done directly with the scan at server side.
> > In the below samples I can see
> > Client -> Server - Scan for certain rows ( we want the rowkeys
> satisfying our criteria)
> > Client <- Server - returns the Results
> > Client -> Server - Delete calls
> >
> > Instead using the Endpoints we can make one call from Client to Server
> in which both the scan and the delete will happen...
> >
> > -Anoop-
> > ________________________________________
> > From: Oleg Ruchovets [[email protected]]
> > Sent: Tuesday, June 19, 2012 9:47 PM
> > To: [email protected]
> > Subject: Re: delete rows from hbase
> >
> > Thank you all for the answers. I try to speed up my solution and user
> > map/reduce over hbase
> >
> > Here is the code:
> > I want to use Delete (map function to delete the row) and I pass the same
> > tableName at TableMapReduceUtil.initTableMapperJob
> > and TableMapReduceUtil.initTableReducerJob.
> >
> > Question: is it possible to pass Delete as I did in map function?
> >
> >
> >
> >
> > public class DeleteRowByCriteria {
> > final static Logger LOG =
> > LoggerFactory.getLogger(DeleteRowByCriteria.class);
> > public static class MyMapper extends
> > TableMapper<ImmutableBytesWritable, Delete> {
> >
> > public String account;
> > public String lifeDate;
> >
> > @Override
> > public void map(ImmutableBytesWritable row, Result value, Context
> > context) throws IOException, InterruptedException {
> > context.write(row, new Delete(row.get()));
> > }
> > }
> > public static void main(String[] args) throws ClassNotFoundException,
> > IOException, InterruptedException {
> >
> > String tableName = args[0];
> > String filterCriteria = args[1];
> >
> > Configuration config = HBaseConfiguration.create();
> > Job job = new Job(config, "DeleteRowByCriteria");
> > job.setJarByClass(DeleteRowByCriteria.class);
> >
> > try {
> >
> > Filter campaignIdFilter = new
> > PrefixFilter(Bytes.toBytes(filterCriteria));
> > Scan scan = new Scan();
> > scan.setFilter(campaignIdFilter);
> > scan.setCaching(500);
> > scan.setCacheBlocks(false);
> >
> >
> > TableMapReduceUtil.initTableMapperJob(
> > tableName,
> > scan,
> > MyMapper.class,
> > null,
> > null,
> > job);
> >
> >
> > TableMapReduceUtil.initTableReducerJob(
> > tableName,
> > null,
> > job);
> > job.setNumReduceTasks(0);
> >
> > boolean b = job.waitForCompletion(true);
> > if (!b) {
> > throw new IOException("error with job!");
> > }
> >
> > }catch (Exception e) {
> > LOG.error(e.getMessage(), e);
> > }
> > }
> > }
> >
> >
> >
> > On Tue, Jun 19, 2012 at 9:26 AM, Kevin O'dell <[email protected]
> >wrote:
> >
> >> Oleg,
> >>
> >> Here is some code that we used for deleting all rows with user name
> >> foo. It should be fairly portable to your situation:
> >>
> >> import java.io.IOException;
> >>
> >> import org.apache.hadoop.conf.Configuration;
> >> import org.apache.hadoop.hbase.HBaseConfiguration;
> >> import org.apache.hadoop.hbase.client.HTable;
> >> import org.apache.hadoop.hbase.client.Result;
> >> import org.apache.hadoop.hbase.client.ResultScanner;
> >> import org.apache.hadoop.hbase.client.Scan;
> >> import org.apache.hadoop.hbase.util.Bytes;
> >>
> >> public class HBaseDelete {
> >> public static void main(String[] args){
> >> Configuration conf = HbaseConfiguration.create();
> >> Htable t = new HTable("t");
> >>
> >> String user = "foo";
> >>
> >> byte[] startRow = Bytes.toBytes(user);
> >> byte[] stopRow = Bytes.toBytes(user);
> >> stopRow[stopRow.length - 1]++; //'fop'
> >> Scan scan = new Scan(start Row, stopRow);
> >> ResultScanner sc = t.getScanner(scan);
> >> for(Result r : sc) {
> >> t.delete(new Delete(r.getRow()));
> >> }
> >> }
> >> }
> >> /**
> >> * Start row: foo
> >> * HBase begins matching this byte, one after another.
> >> * End row: foo
> >> * HBase stops matching at first match, cause start == stop.
> >> * End Row: fo[p] (p being 0 +1)
> >> * HBase stops matching at something not "foo"
> >> */
> >>
> >>
> >> On Tue, Jun 19, 2012 at 6:46 AM, Mohammad Tariq <[email protected]>
> >> wrote:
> >>> you can use Hbase RowFilter to do that.
> >>>
> >>> Regards,
> >>> Mohammad Tariq
> >>>
> >>>
> >>> On Tue, Jun 19, 2012 at 1:13 PM, shashwat shriparv
> >>> <[email protected]> wrote:
> >>>> Try to impliment something like this
> >>>>
> >>>> Class RegexStringComparator
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Jun 19, 2012 at 5:06 AM, Amitanand Aiyer <[email protected]>
> >> wrote:
> >>>>
> >>>>> You could set up a scan with the criteria you want (start row, end
> row,
> >>>>> keyonlyfilter etc), and do a delete for
> >>>>> The rows you get.
> >>>>>
> >>>>> On 6/18/12 3:08 PM, "Oleg Ruchovets" <[email protected]> wrote:
> >>>>>
> >>>>>> Hi ,
> >>>>>> I need to delete rows from hbase table by criteria.
> >>>>>> For example I need to delete all rows started with "12345".
> >>>>>> I didn't find a way to set a row prefix for delete operation.
> >>>>>> What is the best way ( practice ) to delete rows by criteria from
> >> hbase
> >>>>>> table?
> >>>>>>
> >>>>>> Thanks in advance.
> >>>>>> Oleg.
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>>
> >>>> ∞
> >>>> Shashwat Shriparv
> >>
> >>
> >>
> >> --
> >> Kevin O'Dell
> >> Customer Operations Engineer, Cloudera
>
>