Thanks for the feedback, Aru and Keith. I've had some more time to play around with this, and here's some additional observations.
My existing process is very slow. I think this is due to each deletemany command starting up a new scanner and batchwriter, and creating a lot of rpc overhead. I didn't initially think that it would be a significant amount of data, but maybe I just had the wrong idea of what "significant" is in this case. I'm not sure the RowDeletingIterator would work in this case because I do use empty rows for other purposes. The RowFilter at compaction is a great option, except I had hoped to avoid writing actual java code. Looking back at this, I might have to bite that bullet. Again, thanks both for the suggestions! Mike On Tue, Oct 22, 2013 at 12:04 PM, Keith Turner <[email protected]> wrote: > If its a significant amount of data, you could create a class that extends > row filter and set it as a compaction iterator. > > > On Tue, Oct 22, 2013 at 11:45 AM, Mike Drob <[email protected]> wrote: > >> I'm attempting to delete all rows from a table that contain a specific >> word in the value of a specified column. My current process looks like: >> >> accumulo shell -e 'egrep .*EXPRESSION.* -np -t tab -c col' | awk 'BEGIN >> {print "table tab"}; {print "deletemany -f -np -r" $1}; END {print "exit"}' >> > rows.out >> accumulo shell -f rows.out >> >> I tried playing around with scan iterators and various options on >> deletemany and deleterows but wasn't able to find a more straightforward >> way to do this. Does anybody have any suggestions? >> >> Mike >> > >
