Hi Mike, Did you wind up writing java code to do this? Did you go with a RowFilter?
I have a similar circumstance where I need to delete millions of rows daily and the criteria for deletion is not in the rowkey. Thanks in advance, Terry On Wed, Oct 23, 2013 at 4:21 PM, Mike Drob <[email protected]> wrote: > Thanks for the feedback, Aru and Keith. > > I've had some more time to play around with this, and here's some > additional observations. > > My existing process is very slow. I think this is due to each deletemany > command starting up a new scanner and batchwriter, and creating a lot of > rpc overhead. I didn't initially think that it would be a significant > amount of data, but maybe I just had the wrong idea of what "significant" > is in this case. > > I'm not sure the RowDeletingIterator would work in this case because I do > use empty rows for other purposes. The RowFilter at compaction is a great > option, except I had hoped to avoid writing actual java code. Looking back > at this, I might have to bite that bullet. > > Again, thanks both for the suggestions! > > Mike > > > On Tue, Oct 22, 2013 at 12:04 PM, Keith Turner <[email protected]> wrote: > >> If its a significant amount of data, you could create a class that >> extends row filter and set it as a compaction iterator. >> >> >> On Tue, Oct 22, 2013 at 11:45 AM, Mike Drob <[email protected]> wrote: >> >>> I'm attempting to delete all rows from a table that contain a specific >>> word in the value of a specified column. My current process looks like: >>> >>> accumulo shell -e 'egrep .*EXPRESSION.* -np -t tab -c col' | awk 'BEGIN >>> {print "table tab"}; {print "deletemany -f -np -r" $1}; END {print "exit"}' >>> > rows.out >>> accumulo shell -f rows.out >>> >>> I tried playing around with scan iterators and various options on >>> deletemany and deleterows but wasn't able to find a more straightforward >>> way to do this. Does anybody have any suggestions? >>> >>> Mike >>> >> >> >
