Eyes of an eagle Billie! com is correct, but after viewing "org.apache.accumulo" so many times, my brain was stuck on org and I goofed in my setiter syntax.
With THAT corrected, here is the new error: root@meta> setiter -class com.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n expTsFilter -p 20 -scan -t itertest 2013-11-06 14:46:28,280 [shell.Shell] ERROR: org.apache.accumulo.core.util.shell.ShellCommandException: Command could not be initialized (Unable to load com.esa.accumulo.iterators.ExpirationTimestampPurgeFilter as type org.apache.accumulo.core.iterators.OptionDescriber; configure with 'config' instead) On Wed, Nov 6, 2013 at 2:43 PM, Billie Rinaldi <[email protected]>wrote: > Is there a typo in the package name? One place says "com" and the other > "org". > > > On Wed, Nov 6, 2013 at 12:37 PM, Terry P. <[email protected]> wrote: > >> Hi William, many thanks for the explanation of scan time versus >> compaction time. I'll look through the classes again and note where the >> remove versus suppress wordings are used and open a ticket. >> >> As mentioned, I only dabble in java, but regardless of that fact at this >> point I'm the one that has to get this done. I've hobbled together my first >> attempt, but I get the following error where I try to add it as a scan >> iterator for testing: >> >> root@meta> setiter -class >> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n expTsFilter -p >> 20 -scan -t itertest >> 2013-11-06 14:06:34,914 [shell.Shell] ERROR: >> org.apache.accumulo.core.util.shell.ShellCommandException: Command could >> not be initialized (Servers are unable to load >> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter as type >> org.apache.accumulo.core.iterators.SortedKeyValueIterator) >> >> Here's my source. Note that the value stored in the expTs ColFam is in >> the format "yyyyMMddHHmmssS", which I convert to a long for a direct >> comparison to System.currentTimeMillis(). I only overrode the init and >> acceptRow methods, hoping the others would work as-is from the base class. >> >> One clarification: turns out expTs is the ColumnFamily, and the ingest >> app does not assign a ColumnQualifier for expTs. So to amend my prior table >> layout (including the datetime format): >> >> >> Format: Key:CF:CQ:Value >> abc:data:title:"My fantastic data" >> abc:data:content:<bytedata> >> abc:creTs::20130804171412445 >> abc:*expTs*::20131104171412445 >> ... 6-8 more columns of data per row ... >> >> where *expTs* is the ColumnFamily to determine if the entire row should >> be removed based on whether its value is <= NOW. If a row has not yet been >> assigned an expiration date, expTs will not be set and the ColumnFamily >> will not yet be present. Seems like an odd choice to use distinct Column >> Families, without Column Qualifiers, but that's how the ingest app was done. >> >> I greatly appreciate any advice you can provide. >> >> package com.esa.accumulo.iterators; >> >> import java.io.IOException; >> import java.text.ParseException; >> import java.text.SimpleDateFormat; >> import java.util.Date; >> import java.util.Map; >> >> import org.apache.accumulo.core.data.Key; >> import org.apache.accumulo.core.data.Value; >> import org.apache.accumulo.core.iterators.IteratorEnvironment; >> import org.apache.accumulo.core.iterators.SortedKeyValueIterator; >> import org.apache.accumulo.core.iterators.user.RowFilter; >> >> /** >> * A filter that removes rows based on the column designated as the >> "expiration timestamp" column family. >> * >> * It removes the row if the value in the expirationTimestamp column is >> less than currentTime. >> * >> * TODO: The designation of the expirationTimestamp ColumnFamily and its >> DateFormat is >> * set in the iterator options when the iterator is applied to the table. >> (For >> * now it is hardcoded to match the format used in the Solr-Accumulo >> plugin) >> */ >> public class ExpirationTimestampPurgeFilter extends RowFilter { >> private long currentTime; >> // TODO: make accumuloDateFormat settable via Iterator Options >> // Date Format for Expiration Timestamp ColumnFamily stored in Accumulo >> private String expTsDateFormat = "yyyyMMddHHmmssS"; >> SimpleDateFormat df = new SimpleDateFormat(expTsDateFormat); >> >> // TODO: make expTs settable via Iterator Options >> // ColumnFamily containing Expiration Timestamp value (note ingest app >> // did NOT assign a ColumnQualifier, only a ColumnFamily) >> private String expTsColFam = "expTs"; >> >> @Override >> public boolean acceptRow(SortedKeyValueIterator<Key, Value> rowIterator) >> throws IOException { >> >> if >> (rowIterator.getTopKey().getColumnFamily().toString().equals(expTsColFam)) { >> Date expTsDate = null; >> try { >> expTsDate = df.parse(rowIterator.getTopValue().toString()); >> if (expTsDate.getTime() < currentTime) >> return false; >> } catch (ParseException e) { >> // TODO Auto-generated catch block >> e.printStackTrace(); >> } >> } >> return true; >> } >> >> @Override >> public void init(SortedKeyValueIterator<Key, Value> source, >> Map<String, String> options, IteratorEnvironment env) throws >> IOException { >> super.init(source, options, env); >> currentTime = System.currentTimeMillis(); >> } >> >> } >> >> >> >> On Tue, Nov 5, 2013 at 8:48 PM, William Slacum < >> [email protected]> wrote: >> >>> If an iterator is only set at scan time, then its logic will only be >>> applied when a client scans the table. The data will persist through major >>> and minor compaction and be visible if you scanned the RFile(s) backing the >>> table. "Suppress" is the better word in this case. Would you please open a >>> ticket pointing us where to update the documentation? >>> >>> It looks like you'd want to implement a RowFilter for your use case. It >>> has the necessary hooks to avoid reading a whole row into memory and >>> handling the logic of determining whether or not to write keys that occur >>> before the column you're filtering on (at the cost of reading those keys >>> twice). >>> >>> >>> On Tue, Nov 5, 2013 at 6:20 PM, Terry P. <[email protected]> wrote: >>> >>>> Greetings everyone, >>>> I'm looking at the AgeOffFilter as a base from which to write a >>>> server-side filter / iterator to purge rows when they have aged off based >>>> on the value of a specific column in the row (expiry datetime <= now). So >>>> this differs from the AgeOffFilter in that the criterion for removal is >>>> from the same column in every row (not the Accumulo timestamp for an >>>> individual entry), and we need to remove the entire row not just individual >>>> entries. For example: >>>> >>>> Format: Key:CF:CQ:Value >>>> abc:data:title:"My fantastic data" >>>> abc:data:content:<bytedata> >>>> abc:data:creTs:2013-08-04T17:14:12Z >>>> abc:data:*expTs*:2013-11-04T17:14:12Z >>>> ... 6-8 more columns of data per row ... >>>> >>>> where *expTs* is the column to determine if the entire row should be >>>> removed based on whether its value is <= NOW. >>>> >>>> This task seemed easy enough as a client program (and it is really), >>>> but a server-side iterator would be far more efficient than sending >>>> millions of rowkeys across the network just to delete them (we'll be >>>> deleting more than a million every hour). But I'm struggling to get there. >>>> >>>> In looking at AgeOffFilter.java, is the "magic" in the AgeOffFilter >>>> class that removes (deletes) an entry from a table the fact that the accept >>>> method returns false, combined with the fact that the iterator would be set >>>> to run at -majc or -minc time and it is the compaction code that actually >>>> deletes the entry? If set to run only at scan time, would AgeOffFilter >>>> simply not return the rows during the scan, but not delete them? The >>>> wording in the iterator classes varies, some saying "remove" others say >>>> "suppress" so it's not clear to me >>>> >>>> If that's the case, then I think I know where to implement the logic. >>>> The question is, how can I remove all the entries for the row once the >>>> accept method has determined it meets the criteria? >>>> >>>> Or as Mike Drob mentioned in a prior post, will basing my class on the >>>> RowFilter class instead of just Filter make things easier? Or the >>>> WholeRowIterator? Just trying to find the simplest solution. >>>> >>>> Sorry for what may be obvious questions but I'm more of a DB Architect >>>> that does some coding, and not a Java programmer by trade. With all of the >>>> amazing things Accumulo does, honestly I was surprised when I couldn't find >>>> a way to delete rows in the shell by criteria other than the rowkey! I'm >>>> more used to having a shell to 'delete from *table *where *column *<= >>>> *value*'. >>>> >>>> But looking at it now, everyone's criteria for deletion will likely be >>>> different given the flexibility of a key=>value store. If our rowkey had >>>> the date/timestamp as a prefix, I know an easy deletemany command in the >>>> shell would do the trick -- but the nature of the data is such that >>>> initially no expiration timestamp is set, and there is no means to update >>>> the key from the client app when expiration timestamp finally gets set (too >>>> much rework on that common tool I'm afraid). >>>> >>>> Thanks in advance. >>>> >>> >>> >> >
