Thank you all for the answers. I try to speed up my solution and user
map/reduce over hbase
Here is the code:
I want to use Delete (map function to delete the row) and I pass the same
tableName at TableMapReduceUtil.initTableMapperJob
and TableMapReduceUtil.initTableReducerJob.
Question: is it possible to pass Delete as I did in map function?
public class DeleteRowByCriteria {
final static Logger LOG =
LoggerFactory.getLogger(DeleteRowByCriteria.class);
public static class MyMapper extends
TableMapper<ImmutableBytesWritable, Delete> {
public String account;
public String lifeDate;
@Override
public void map(ImmutableBytesWritable row, Result value, Context
context) throws IOException, InterruptedException {
context.write(row, new Delete(row.get()));
}
}
public static void main(String[] args) throws ClassNotFoundException,
IOException, InterruptedException {
String tableName = args[0];
String filterCriteria = args[1];
Configuration config = HBaseConfiguration.create();
Job job = new Job(config, "DeleteRowByCriteria");
job.setJarByClass(DeleteRowByCriteria.class);
try {
Filter campaignIdFilter = new
PrefixFilter(Bytes.toBytes(filterCriteria));
Scan scan = new Scan();
scan.setFilter(campaignIdFilter);
scan.setCaching(500);
scan.setCacheBlocks(false);
TableMapReduceUtil.initTableMapperJob(
tableName,
scan,
MyMapper.class,
null,
null,
job);
TableMapReduceUtil.initTableReducerJob(
tableName,
null,
job);
job.setNumReduceTasks(0);
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
}catch (Exception e) {
LOG.error(e.getMessage(), e);
}
}
}
On Tue, Jun 19, 2012 at 9:26 AM, Kevin O'dell <[email protected]>wrote:
> Oleg,
>
> Here is some code that we used for deleting all rows with user name
> foo. It should be fairly portable to your situation:
>
> import java.io.IOException;
>
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.client.HTable;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.client.ResultScanner;
> import org.apache.hadoop.hbase.client.Scan;
> import org.apache.hadoop.hbase.util.Bytes;
>
> public class HBaseDelete {
> public static void main(String[] args){
> Configuration conf = HbaseConfiguration.create();
> Htable t = new HTable("t");
>
> String user = "foo";
>
> byte[] startRow = Bytes.toBytes(user);
> byte[] stopRow = Bytes.toBytes(user);
> stopRow[stopRow.length - 1]++; //'fop'
> Scan scan = new Scan(start Row, stopRow);
> ResultScanner sc = t.getScanner(scan);
> for(Result r : sc) {
> t.delete(new Delete(r.getRow()));
> }
> }
> }
> /**
> * Start row: foo
> * HBase begins matching this byte, one after another.
> * End row: foo
> * HBase stops matching at first match, cause start == stop.
> * End Row: fo[p] (p being 0 +1)
> * HBase stops matching at something not "foo"
> */
>
>
> On Tue, Jun 19, 2012 at 6:46 AM, Mohammad Tariq <[email protected]>
> wrote:
> > you can use Hbase RowFilter to do that.
> >
> > Regards,
> > Mohammad Tariq
> >
> >
> > On Tue, Jun 19, 2012 at 1:13 PM, shashwat shriparv
> > <[email protected]> wrote:
> >> Try to impliment something like this
> >>
> >> Class RegexStringComparator
> >>
> >>
> >>
> >> On Tue, Jun 19, 2012 at 5:06 AM, Amitanand Aiyer <[email protected]>
> wrote:
> >>
> >>> You could set up a scan with the criteria you want (start row, end row,
> >>> keyonlyfilter etc), and do a delete for
> >>> The rows you get.
> >>>
> >>> On 6/18/12 3:08 PM, "Oleg Ruchovets" <[email protected]> wrote:
> >>>
> >>> >Hi ,
> >>> >I need to delete rows from hbase table by criteria.
> >>> >For example I need to delete all rows started with "12345".
> >>> >I didn't find a way to set a row prefix for delete operation.
> >>> >What is the best way ( practice ) to delete rows by criteria from
> hbase
> >>> >table?
> >>> >
> >>> >Thanks in advance.
> >>> >Oleg.
> >>>
> >>>
> >>
> >>
> >> --
> >>
> >>
> >> ∞
> >> Shashwat Shriparv
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
>