Re: Deleting rows from the Java API

Billie J Rinaldi Wed, 09 May 2012 08:01:21 -0700

On Wednesday, May 9, 2012 10:31:46 AM, "Sean Pines" <[email protected]> wrote:
> I have a use case that involves me removing a record from Accumulo
> based on the Row ID and the Column Family.
> 
> In the shell, I noticed the command "deletemany" which allows you to
> specify column family/column qualifier. Is there an equivalent of this
> in the Java API?
> 
> In the Java API, I noticed the method:
> deleteRows(String tableName, org.apache.hadoop.io.Text start,
> org.apache.hadoop.io.Text end)
> Delete rows between (start, end]
> 
> However that only seems to work for deleting a range of RowIDs
> 
> I would also imagine that deleting rows is costly; is there a better
> way to approach something like this?
> The workaround I have for now is to just overwrite the row with an
> empty string in the value field and ignore any entries that have that.
> However this just leaves lingering rows for each "delete" and I'd like
> to avoid that if at all possible.
> 
> Thanks!


Connector provides a createBatchDeleter method.  You can set the range and 
columns for BatchDeleter just like you would with a Scanner.  This is not an 
efficient operation (despite the current javadocs for BatchDeleter), but it 
works well if you're deleting a small number of entries.  It scans for the 
affected key/value pairs, pulls them back to the client, then inserts deletion 
entries for each.  The deleteRows method, on the other hand, is efficient 
because large ranges can just be dropped.  If you want to delete a lot of 
things and deleteRows won't work for you, consider using a majc scope Filter 
that filters out what you don't want, compact the table, then remove the filter.

Billie

Re: Deleting rows from the Java API

Reply via email to