Re: Deleting many rows that match a given criterion

Mike Drob Wed, 23 Oct 2013 14:23:08 -0700

Thanks for the feedback, Aru and Keith.

I've had some more time to play around with this, and here's some
additional observations.

My existing process is very slow. I think this is due to each deletemany
command starting up a new scanner and batchwriter, and creating a lot of
rpc overhead. I didn't initially think that it would be a significant
amount of data, but maybe I just had the wrong idea of what "significant"
is in this case.

I'm not sure the RowDeletingIterator would work in this case because I do
use empty rows for other purposes. The RowFilter at compaction is a great
option, except I had hoped to avoid writing actual java code. Looking back
at this, I might have to bite that bullet.

Again, thanks both for the suggestions!

Mike

On Tue, Oct 22, 2013 at 12:04 PM, Keith Turner <[email protected]> wrote:

> If its a significant amount of data, you could create a class that extends
> row filter and set it as a compaction iterator.
>
>
> On Tue, Oct 22, 2013 at 11:45 AM, Mike Drob <[email protected]> wrote:
>
>> I'm attempting to delete all rows from a table that contain a specific
>> word in the value of a specified column. My current process looks like:
>>
>> accumulo shell -e 'egrep .*EXPRESSION.* -np -t tab -c col' | awk 'BEGIN
>> {print "table tab"}; {print "deletemany -f -np -r" $1}; END {print "exit"}'
>> > rows.out
>> accumulo shell -f rows.out
>>
>> I tried playing around with scan iterators and various options on
>> deletemany and deleterows but wasn't able to find a more straightforward
>> way to do this. Does anybody have any suggestions?
>>
>> Mike
>>
>
>

Re: Deleting many rows that match a given criterion

Reply via email to