Like you said, a column is required for a row to exist, else if it doesn't exist you don't need to delete it right? :)
What we do for fast "almost row key only" scanning is using the FirstKeyOnlyFilter on the Scan. See the RowCounter job's code: Scan scan = new Scan(); scan.setFilter(new FirstKeyOnlyFilter()); Then you can even setCaching to some high number for really fast scanning, although deleting will still be the bottleneck. J-D On Tue, Nov 2, 2010 at 3:14 AM, Henning Blohm <[email protected]> wrote: > Hi, > > I need to delete a range of rows from an HBase table. A time-to-live > setting as proposed in > > http://www.mail-archive.com/[email protected]/msg09492.html > > will not do as there will be no clear point in time when that clean up > will be required / adviced. > > The way it is implemented now essentially looks like this: > > HTable c = _table(<table>); > Scan s = new Scan("".getBytes(),endKey.getBytes()); > s.addColumn(<family>.getBytes()); > ResultScanner rs = c.getScanner(s); > try { > Result r; > while ((r=rs.next())!=null) { > c.delete(new Delete(r.getRow())); > } > } finally { > rs.close(); > } > c.flushCommits(); > > While that works, it is suboptimal as it seems to require to define a > column or column family > to retrieve data for. Worse: It seems that a column is required that is > always present to > really hit all relevant rows. > > However, all that is required is the keys! > > I found https://issues.apache.org/jira/browse/HBASE-1481 and was > wondering whether > there has been any progress on that. > > What is the best way to accomplish something like key-only scanning? > > Thanks, > Henning >
