Hi Billie, Many thanks for your help. I added those two methods, but had to remove the @Override as the RowFilter class I'm extending from doesn't implement them. Even with these methods in place, I still get the same error trying to add the iterator in the shell.
I notice that the RowFilter class extends WrappingIterator, which also doesn't appear to have the describeOptions and validateOptions methods ... should I try extending from just the Filter class? I didn't understand the benefits William listed of extending from the RowFilter class. I just know that once I identify a RowKey should be purged based on its expTs ColFam Value, I want to remove all entries for that RowKey. On Wed, Nov 6, 2013 at 3:29 PM, Billie Rinaldi <[email protected]>wrote: > To use setiter in the shell, your iterator must implement > OptionDescriber. It has two methods, and something like the following > should work for your iterator. If you implement passing options to the > iterator, you'll want to change the null parameters to the constructor of > IteratorOptions below, and probably also to do some validation in > validateOptions. > > @Override > public IteratorOptions describeOptions() { > return new IteratorOptions("expTs", "Removes rows based on the column > designated as the expiration timestamp column family", null, null); > } > > @Override > public boolean validateOptions(Map<String,String> options) { > return true; > } > > > > On Wed, Nov 6, 2013 at 12:49 PM, Terry P. <[email protected]> wrote: > >> Eyes of an eagle Billie! com is correct, but after viewing >> "org.apache.accumulo" so many times, my brain was stuck on org and I goofed >> in my setiter syntax. >> >> With THAT corrected, here is the new error: >> >> root@meta> setiter -class >> com.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n expTsFilter -p >> 20 -scan -t itertest >> 2013-11-06 14:46:28,280 [shell.Shell] ERROR: >> org.apache.accumulo.core.util.shell.ShellCommandException: Command could >> not be initialized (Unable to load >> com.esa.accumulo.iterators.ExpirationTimestampPurgeFilter as type >> org.apache.accumulo.core.iterators.OptionDescriber; configure with 'config' >> instead) >> >> >> >> >> >> On Wed, Nov 6, 2013 at 2:43 PM, Billie Rinaldi >> <[email protected]>wrote: >> >>> Is there a typo in the package name? One place says "com" and the other >>> "org". >>> >>> >>> On Wed, Nov 6, 2013 at 12:37 PM, Terry P. <[email protected]> wrote: >>> >>>> Hi William, many thanks for the explanation of scan time versus >>>> compaction time. I'll look through the classes again and note where the >>>> remove versus suppress wordings are used and open a ticket. >>>> >>>> As mentioned, I only dabble in java, but regardless of that fact at >>>> this point I'm the one that has to get this done. I've hobbled together my >>>> first attempt, but I get the following error where I try to add it as a >>>> scan iterator for testing: >>>> >>>> root@meta> setiter -class >>>> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n expTsFilter -p >>>> 20 -scan -t itertest >>>> 2013-11-06 14:06:34,914 [shell.Shell] ERROR: >>>> org.apache.accumulo.core.util.shell.ShellCommandException: Command could >>>> not be initialized (Servers are unable to load >>>> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter as type >>>> org.apache.accumulo.core.iterators.SortedKeyValueIterator) >>>> >>>> Here's my source. Note that the value stored in the expTs ColFam is in >>>> the format "yyyyMMddHHmmssS", which I convert to a long for a direct >>>> comparison to System.currentTimeMillis(). I only overrode the init and >>>> acceptRow methods, hoping the others would work as-is from the base class. >>>> >>>> One clarification: turns out expTs is the ColumnFamily, and the ingest >>>> app does not assign a ColumnQualifier for expTs. So to amend my prior table >>>> layout (including the datetime format): >>>> >>>> >>>> Format: Key:CF:CQ:Value >>>> abc:data:title:"My fantastic data" >>>> abc:data:content:<bytedata> >>>> abc:creTs::20130804171412445 >>>> abc:*expTs*::20131104171412445 >>>> ... 6-8 more columns of data per row ... >>>> >>>> where *expTs* is the ColumnFamily to determine if the entire row >>>> should be removed based on whether its value is <= NOW. If a row has not >>>> yet been assigned an expiration date, expTs will not be set and the >>>> ColumnFamily will not yet be present. Seems like an odd choice to use >>>> distinct Column Families, without Column Qualifiers, but that's how the >>>> ingest app was done. >>>> >>>> I greatly appreciate any advice you can provide. >>>> >>>> package com.esa.accumulo.iterators; >>>> >>>> import java.io.IOException; >>>> import java.text.ParseException; >>>> import java.text.SimpleDateFormat; >>>> import java.util.Date; >>>> import java.util.Map; >>>> >>>> import org.apache.accumulo.core.data.Key; >>>> import org.apache.accumulo.core.data.Value; >>>> import org.apache.accumulo.core.iterators.IteratorEnvironment; >>>> import org.apache.accumulo.core.iterators.SortedKeyValueIterator; >>>> import org.apache.accumulo.core.iterators.user.RowFilter; >>>> >>>> /** >>>> * A filter that removes rows based on the column designated as the >>>> "expiration timestamp" column family. >>>> * >>>> * It removes the row if the value in the expirationTimestamp column is >>>> less than currentTime. >>>> * >>>> * TODO: The designation of the expirationTimestamp ColumnFamily and >>>> its DateFormat is >>>> * set in the iterator options when the iterator is applied to the >>>> table. (For >>>> * now it is hardcoded to match the format used in the Solr-Accumulo >>>> plugin) >>>> */ >>>> public class ExpirationTimestampPurgeFilter extends RowFilter { >>>> private long currentTime; >>>> // TODO: make accumuloDateFormat settable via Iterator Options >>>> // Date Format for Expiration Timestamp ColumnFamily stored in >>>> Accumulo >>>> private String expTsDateFormat = "yyyyMMddHHmmssS"; >>>> SimpleDateFormat df = new SimpleDateFormat(expTsDateFormat); >>>> >>>> // TODO: make expTs settable via Iterator Options >>>> // ColumnFamily containing Expiration Timestamp value (note ingest app >>>> // did NOT assign a ColumnQualifier, only a ColumnFamily) >>>> private String expTsColFam = "expTs"; >>>> >>>> @Override >>>> public boolean acceptRow(SortedKeyValueIterator<Key, Value> >>>> rowIterator) >>>> throws IOException { >>>> >>>> if >>>> (rowIterator.getTopKey().getColumnFamily().toString().equals(expTsColFam)) >>>> { >>>> Date expTsDate = null; >>>> try { >>>> expTsDate = df.parse(rowIterator.getTopValue().toString()); >>>> if (expTsDate.getTime() < currentTime) >>>> return false; >>>> } catch (ParseException e) { >>>> // TODO Auto-generated catch block >>>> e.printStackTrace(); >>>> } >>>> } >>>> return true; >>>> } >>>> >>>> @Override >>>> public void init(SortedKeyValueIterator<Key, Value> source, >>>> Map<String, String> options, IteratorEnvironment env) throws >>>> IOException { >>>> super.init(source, options, env); >>>> currentTime = System.currentTimeMillis(); >>>> } >>>> >>>> } >>>> >>>> >>>> >>>> On Tue, Nov 5, 2013 at 8:48 PM, William Slacum < >>>> [email protected]> wrote: >>>> >>>>> If an iterator is only set at scan time, then its logic will only be >>>>> applied when a client scans the table. The data will persist through major >>>>> and minor compaction and be visible if you scanned the RFile(s) backing >>>>> the >>>>> table. "Suppress" is the better word in this case. Would you please open a >>>>> ticket pointing us where to update the documentation? >>>>> >>>>> It looks like you'd want to implement a RowFilter for your use case. >>>>> It has the necessary hooks to avoid reading a whole row into memory and >>>>> handling the logic of determining whether or not to write keys that occur >>>>> before the column you're filtering on (at the cost of reading those keys >>>>> twice). >>>>> >>>>> >>>>> On Tue, Nov 5, 2013 at 6:20 PM, Terry P. <[email protected]> wrote: >>>>> >>>>>> Greetings everyone, >>>>>> I'm looking at the AgeOffFilter as a base from which to write a >>>>>> server-side filter / iterator to purge rows when they have aged off based >>>>>> on the value of a specific column in the row (expiry datetime <= now). So >>>>>> this differs from the AgeOffFilter in that the criterion for removal is >>>>>> from the same column in every row (not the Accumulo timestamp for an >>>>>> individual entry), and we need to remove the entire row not just >>>>>> individual >>>>>> entries. For example: >>>>>> >>>>>> Format: Key:CF:CQ:Value >>>>>> abc:data:title:"My fantastic data" >>>>>> abc:data:content:<bytedata> >>>>>> abc:data:creTs:2013-08-04T17:14:12Z >>>>>> abc:data:*expTs*:2013-11-04T17:14:12Z >>>>>> ... 6-8 more columns of data per row ... >>>>>> >>>>>> where *expTs* is the column to determine if the entire row should be >>>>>> removed based on whether its value is <= NOW. >>>>>> >>>>>> This task seemed easy enough as a client program (and it is really), >>>>>> but a server-side iterator would be far more efficient than sending >>>>>> millions of rowkeys across the network just to delete them (we'll be >>>>>> deleting more than a million every hour). But I'm struggling to get >>>>>> there. >>>>>> >>>>>> In looking at AgeOffFilter.java, is the "magic" in the AgeOffFilter >>>>>> class that removes (deletes) an entry from a table the fact that the >>>>>> accept >>>>>> method returns false, combined with the fact that the iterator would be >>>>>> set >>>>>> to run at -majc or -minc time and it is the compaction code that actually >>>>>> deletes the entry? If set to run only at scan time, would AgeOffFilter >>>>>> simply not return the rows during the scan, but not delete them? The >>>>>> wording in the iterator classes varies, some saying "remove" others say >>>>>> "suppress" so it's not clear to me >>>>>> >>>>>> If that's the case, then I think I know where to implement the logic. >>>>>> The question is, how can I remove all the entries for the row once the >>>>>> accept method has determined it meets the criteria? >>>>>> >>>>>> Or as Mike Drob mentioned in a prior post, will basing my class on >>>>>> the RowFilter class instead of just Filter make things easier? Or the >>>>>> WholeRowIterator? Just trying to find the simplest solution. >>>>>> >>>>>> Sorry for what may be obvious questions but I'm more of a DB >>>>>> Architect that does some coding, and not a Java programmer by trade. With >>>>>> all of the amazing things Accumulo does, honestly I was surprised when I >>>>>> couldn't find a way to delete rows in the shell by criteria other than >>>>>> the >>>>>> rowkey! I'm more used to having a shell to 'delete from *table *where >>>>>> *column *<= *value*'. >>>>>> >>>>>> But looking at it now, everyone's criteria for deletion will likely >>>>>> be different given the flexibility of a key=>value store. If our rowkey >>>>>> had the date/timestamp as a prefix, I know an easy deletemany command in >>>>>> the shell would do the trick -- but the nature of the data is such that >>>>>> initially no expiration timestamp is set, and there is no means to update >>>>>> the key from the client app when expiration timestamp finally gets set >>>>>> (too >>>>>> much rework on that common tool I'm afraid). >>>>>> >>>>>> Thanks in advance. >>>>>> >>>>> >>>>> >>>> >>> >> >
