On Tue, 2010-11-02 at 09:29 -0700, Jean-Daniel Cryans wrote:

> Like you said, a column is required for a row to exist, else if it
> doesn't exist you don't need to delete it right? :)


Well ... of course there is some column, but not necessarily always the
same one plus I don't
care about its values during delete.


> 
> What we do for fast "almost row key only" scanning is using the
> FirstKeyOnlyFilter on the Scan. See the RowCounter job's code:
> 
> Scan scan = new Scan();
> scan.setFilter(new FirstKeyOnlyFilter());
> 


I changed the code accordingly and as far as tests test it correctly it
works. Great!


> Then you can even setCaching to some high number for really fast
> scanning, although deleting will still be the bottleneck.
> 
> J-D


Thanks!
Henning


> 
> On Tue, Nov 2, 2010 at 3:14 AM, Henning Blohm <[email protected]> 
> wrote:
> > Hi,
> >
> >  I need to delete a range of rows from an HBase table. A time-to-live
> > setting as proposed in
> >
> > http://www.mail-archive.com/[email protected]/msg09492.html
> >
> > will not do as there will be no clear point in time when that clean up
> > will be required / adviced.
> >
> > The way it is implemented now essentially looks like this:
> >
> >                        HTable c = _table(<table>);
> >                        Scan s = new Scan("".getBytes(),endKey.getBytes());
> >                        s.addColumn(<family>.getBytes());
> >                        ResultScanner rs = c.getScanner(s);
> >                        try {
> >                                Result r;
> >                                while ((r=rs.next())!=null) {
> >                                        c.delete(new Delete(r.getRow()));
> >                                }
> >                        } finally {
> >                                rs.close();
> >                        }
> >                        c.flushCommits();
> >
> > While that works, it is suboptimal as it seems to require to define a
> > column or column family
> > to retrieve data for. Worse: It seems that a column is required that is
> > always present to
> > really hit all relevant rows.
> >
> > However, all that is required is the keys!
> >
> > I found https://issues.apache.org/jira/browse/HBASE-1481 and was
> > wondering whether
> > there has been any progress on that.
> >
> > What is the best way to accomplish something like key-only scanning?
> >
> > Thanks,
> >  Henning
> >


Reply via email to