Thanks David, good to know. After adding the implements OptionDescriber the setiter command worked and it shows up right at the top.
On Wed, Nov 6, 2013 at 8:06 PM, David Medinets <[email protected]>wrote: > Just in case you didn't know there is a 'classpath' command in the > Accumulo shell which should list your custom jar. It's handy to verify that > it was loaded. I think there might also be a log entry if you have access > to them. I've also found it useful to use 'jar tf <filename> on the > Accumulo nodes to verify the jar file contents. Sometimes I've deployed the > wrong version of a jar file. > > > On Wed, Nov 6, 2013 at 7:56 PM, Billie Rinaldi > <[email protected]>wrote: > >> Making your class "extends RowFilter implements OptionDescriber" should >> be fine. One reason it might have been complaining about the @Override >> annotations is if the Java compiler is set to 1.5 compatibility rather than >> 1.6. >> >> Regarding getting the same error, did you replace all the jars containing >> your iterator on all the nodes? If you did, perhaps it's not reloading the >> jars properly. You could restart accumulo to make sure it's using the >> fresh jar, or you could try renaming your class and dropping it in with a >> different jar name to ensure the new code is being picked up. >> >> >> On Wed, Nov 6, 2013 at 2:50 PM, Terry P. <[email protected]> wrote: >> >>> Hi Billie, >>> Many thanks for your help. I added those two methods, but had to remove >>> the @Override as the RowFilter class I'm extending from doesn't implement >>> them. Even with these methods in place, I still get the same error trying >>> to add the iterator in the shell. >>> >>> I notice that the RowFilter class extends WrappingIterator, which also >>> doesn't appear to have the describeOptions and validateOptions methods ... >>> should I try extending from just the Filter class? I didn't understand the >>> benefits William listed of extending from the RowFilter class. I just know >>> that once I identify a RowKey should be purged based on its expTs ColFam >>> Value, I want to remove all entries for that RowKey. >>> >>> >>> On Wed, Nov 6, 2013 at 3:29 PM, Billie Rinaldi <[email protected] >>> > wrote: >>> >>>> To use setiter in the shell, your iterator must implement >>>> OptionDescriber. It has two methods, and something like the following >>>> should work for your iterator. If you implement passing options to the >>>> iterator, you'll want to change the null parameters to the constructor of >>>> IteratorOptions below, and probably also to do some validation in >>>> validateOptions. >>>> >>>> @Override >>>> public IteratorOptions describeOptions() { >>>> return new IteratorOptions("expTs", "Removes rows based on the >>>> column designated as the expiration timestamp column family", null, null); >>>> } >>>> >>>> @Override >>>> public boolean validateOptions(Map<String,String> options) { >>>> return true; >>>> } >>>> >>>> >>>> >>>> On Wed, Nov 6, 2013 at 12:49 PM, Terry P. <[email protected]> wrote: >>>> >>>>> Eyes of an eagle Billie! com is correct, but after viewing >>>>> "org.apache.accumulo" so many times, my brain was stuck on org and I >>>>> goofed >>>>> in my setiter syntax. >>>>> >>>>> With THAT corrected, here is the new error: >>>>> >>>>> root@meta> setiter -class >>>>> com.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n expTsFilter >>>>> -p >>>>> 20 -scan -t itertest >>>>> 2013-11-06 14:46:28,280 [shell.Shell] ERROR: >>>>> org.apache.accumulo.core.util.shell.ShellCommandException: Command could >>>>> not be initialized (Unable to load >>>>> com.esa.accumulo.iterators.ExpirationTimestampPurgeFilter as type >>>>> org.apache.accumulo.core.iterators.OptionDescriber; configure with >>>>> 'config' >>>>> instead) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Nov 6, 2013 at 2:43 PM, Billie Rinaldi < >>>>> [email protected]> wrote: >>>>> >>>>>> Is there a typo in the package name? One place says "com" and the >>>>>> other "org". >>>>>> >>>>>> >>>>>> On Wed, Nov 6, 2013 at 12:37 PM, Terry P. <[email protected]> wrote: >>>>>> >>>>>>> Hi William, many thanks for the explanation of scan time versus >>>>>>> compaction time. I'll look through the classes again and note where the >>>>>>> remove versus suppress wordings are used and open a ticket. >>>>>>> >>>>>>> As mentioned, I only dabble in java, but regardless of that fact at >>>>>>> this point I'm the one that has to get this done. I've hobbled together >>>>>>> my >>>>>>> first attempt, but I get the following error where I try to add it as a >>>>>>> scan iterator for testing: >>>>>>> >>>>>>> root@meta> setiter -class >>>>>>> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n >>>>>>> expTsFilter -p >>>>>>> 20 -scan -t itertest >>>>>>> 2013-11-06 14:06:34,914 [shell.Shell] ERROR: >>>>>>> org.apache.accumulo.core.util.shell.ShellCommandException: Command could >>>>>>> not be initialized (Servers are unable to load >>>>>>> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter as type >>>>>>> org.apache.accumulo.core.iterators.SortedKeyValueIterator) >>>>>>> >>>>>>> Here's my source. Note that the value stored in the expTs ColFam is >>>>>>> in the format "yyyyMMddHHmmssS", which I convert to a long for a direct >>>>>>> comparison to System.currentTimeMillis(). I only overrode the init and >>>>>>> acceptRow methods, hoping the others would work as-is from the base >>>>>>> class. >>>>>>> >>>>>>> One clarification: turns out expTs is the ColumnFamily, and the >>>>>>> ingest app does not assign a ColumnQualifier for expTs. So to amend my >>>>>>> prior table layout (including the datetime format): >>>>>>> >>>>>>> >>>>>>> Format: Key:CF:CQ:Value >>>>>>> abc:data:title:"My fantastic data" >>>>>>> abc:data:content:<bytedata> >>>>>>> abc:creTs::20130804171412445 >>>>>>> abc:*expTs*::20131104171412445 >>>>>>> ... 6-8 more columns of data per row ... >>>>>>> >>>>>>> where *expTs* is the ColumnFamily to determine if the entire row >>>>>>> should be removed based on whether its value is <= NOW. If a row has >>>>>>> not >>>>>>> yet been assigned an expiration date, expTs will not be set and the >>>>>>> ColumnFamily will not yet be present. Seems like an odd choice to use >>>>>>> distinct Column Families, without Column Qualifiers, but that's how the >>>>>>> ingest app was done. >>>>>>> >>>>>>> I greatly appreciate any advice you can provide. >>>>>>> >>>>>>> package com.esa.accumulo.iterators; >>>>>>> >>>>>>> import java.io.IOException; >>>>>>> import java.text.ParseException; >>>>>>> import java.text.SimpleDateFormat; >>>>>>> import java.util.Date; >>>>>>> import java.util.Map; >>>>>>> >>>>>>> import org.apache.accumulo.core.data.Key; >>>>>>> import org.apache.accumulo.core.data.Value; >>>>>>> import org.apache.accumulo.core.iterators.IteratorEnvironment; >>>>>>> import org.apache.accumulo.core.iterators.SortedKeyValueIterator; >>>>>>> import org.apache.accumulo.core.iterators.user.RowFilter; >>>>>>> >>>>>>> /** >>>>>>> * A filter that removes rows based on the column designated as the >>>>>>> "expiration timestamp" column family. >>>>>>> * >>>>>>> * It removes the row if the value in the expirationTimestamp column >>>>>>> is less than currentTime. >>>>>>> * >>>>>>> * TODO: The designation of the expirationTimestamp ColumnFamily and >>>>>>> its DateFormat is >>>>>>> * set in the iterator options when the iterator is applied to the >>>>>>> table. (For >>>>>>> * now it is hardcoded to match the format used in the Solr-Accumulo >>>>>>> plugin) >>>>>>> */ >>>>>>> public class ExpirationTimestampPurgeFilter extends RowFilter { >>>>>>> private long currentTime; >>>>>>> // TODO: make accumuloDateFormat settable via Iterator Options >>>>>>> // Date Format for Expiration Timestamp ColumnFamily stored in >>>>>>> Accumulo >>>>>>> private String expTsDateFormat = "yyyyMMddHHmmssS"; >>>>>>> SimpleDateFormat df = new SimpleDateFormat(expTsDateFormat); >>>>>>> >>>>>>> // TODO: make expTs settable via Iterator Options >>>>>>> // ColumnFamily containing Expiration Timestamp value (note ingest >>>>>>> app >>>>>>> // did NOT assign a ColumnQualifier, only a ColumnFamily) >>>>>>> private String expTsColFam = "expTs"; >>>>>>> >>>>>>> @Override >>>>>>> public boolean acceptRow(SortedKeyValueIterator<Key, Value> >>>>>>> rowIterator) >>>>>>> throws IOException { >>>>>>> >>>>>>> if >>>>>>> (rowIterator.getTopKey().getColumnFamily().toString().equals(expTsColFam)) >>>>>>> { >>>>>>> Date expTsDate = null; >>>>>>> try { >>>>>>> expTsDate = df.parse(rowIterator.getTopValue().toString()); >>>>>>> if (expTsDate.getTime() < currentTime) >>>>>>> return false; >>>>>>> } catch (ParseException e) { >>>>>>> // TODO Auto-generated catch block >>>>>>> e.printStackTrace(); >>>>>>> } >>>>>>> } >>>>>>> return true; >>>>>>> } >>>>>>> >>>>>>> @Override >>>>>>> public void init(SortedKeyValueIterator<Key, Value> source, >>>>>>> Map<String, String> options, IteratorEnvironment env) throws >>>>>>> IOException { >>>>>>> super.init(source, options, env); >>>>>>> currentTime = System.currentTimeMillis(); >>>>>>> } >>>>>>> >>>>>>> } >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Nov 5, 2013 at 8:48 PM, William Slacum < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> If an iterator is only set at scan time, then its logic will only >>>>>>>> be applied when a client scans the table. The data will persist through >>>>>>>> major and minor compaction and be visible if you scanned the RFile(s) >>>>>>>> backing the table. "Suppress" is the better word in this case. Would >>>>>>>> you >>>>>>>> please open a ticket pointing us where to update the documentation? >>>>>>>> >>>>>>>> It looks like you'd want to implement a RowFilter for your use >>>>>>>> case. It has the necessary hooks to avoid reading a whole row into >>>>>>>> memory >>>>>>>> and handling the logic of determining whether or not to write keys that >>>>>>>> occur before the column you're filtering on (at the cost of reading >>>>>>>> those >>>>>>>> keys twice). >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Nov 5, 2013 at 6:20 PM, Terry P. <[email protected]>wrote: >>>>>>>> >>>>>>>>> Greetings everyone, >>>>>>>>> I'm looking at the AgeOffFilter as a base from which to write a >>>>>>>>> server-side filter / iterator to purge rows when they have aged off >>>>>>>>> based >>>>>>>>> on the value of a specific column in the row (expiry datetime <= >>>>>>>>> now). So >>>>>>>>> this differs from the AgeOffFilter in that the criterion for removal >>>>>>>>> is >>>>>>>>> from the same column in every row (not the Accumulo timestamp for an >>>>>>>>> individual entry), and we need to remove the entire row not just >>>>>>>>> individual >>>>>>>>> entries. For example: >>>>>>>>> >>>>>>>>> Format: Key:CF:CQ:Value >>>>>>>>> abc:data:title:"My fantastic data" >>>>>>>>> abc:data:content:<bytedata> >>>>>>>>> abc:data:creTs:2013-08-04T17:14:12Z >>>>>>>>> abc:data:*expTs*:2013-11-04T17:14:12Z >>>>>>>>> ... 6-8 more columns of data per row ... >>>>>>>>> >>>>>>>>> where *expTs* is the column to determine if the entire row should >>>>>>>>> be removed based on whether its value is <= NOW. >>>>>>>>> >>>>>>>>> This task seemed easy enough as a client program (and it is >>>>>>>>> really), but a server-side iterator would be far more efficient than >>>>>>>>> sending millions of rowkeys across the network just to delete them >>>>>>>>> (we'll >>>>>>>>> be deleting more than a million every hour). But I'm struggling to >>>>>>>>> get >>>>>>>>> there. >>>>>>>>> >>>>>>>>> In looking at AgeOffFilter.java, is the "magic" in the >>>>>>>>> AgeOffFilter class that removes (deletes) an entry from a table the >>>>>>>>> fact >>>>>>>>> that the accept method returns false, combined with the fact that the >>>>>>>>> iterator would be set to run at -majc or -minc time and it is the >>>>>>>>> compaction code that actually deletes the entry? If set to run only >>>>>>>>> at >>>>>>>>> scan time, would AgeOffFilter simply not return the rows during the >>>>>>>>> scan, >>>>>>>>> but not delete them? The wording in the iterator classes varies, some >>>>>>>>> saying "remove" others say "suppress" so it's not clear to me >>>>>>>>> >>>>>>>>> If that's the case, then I think I know where to implement the >>>>>>>>> logic. The question is, how can I remove all the entries for the row >>>>>>>>> once >>>>>>>>> the accept method has determined it meets the criteria? >>>>>>>>> >>>>>>>>> Or as Mike Drob mentioned in a prior post, will basing my class on >>>>>>>>> the RowFilter class instead of just Filter make things easier? Or the >>>>>>>>> WholeRowIterator? Just trying to find the simplest solution. >>>>>>>>> >>>>>>>>> Sorry for what may be obvious questions but I'm more of a DB >>>>>>>>> Architect that does some coding, and not a Java programmer by trade. >>>>>>>>> With >>>>>>>>> all of the amazing things Accumulo does, honestly I was surprised >>>>>>>>> when I >>>>>>>>> couldn't find a way to delete rows in the shell by criteria other >>>>>>>>> than the >>>>>>>>> rowkey! I'm more used to having a shell to 'delete from *table *where >>>>>>>>> *column *<= *value*'. >>>>>>>>> >>>>>>>>> But looking at it now, everyone's criteria for deletion will >>>>>>>>> likely be different given the flexibility of a key=>value store. If >>>>>>>>> our >>>>>>>>> rowkey had the date/timestamp as a prefix, I know an easy deletemany >>>>>>>>> command in the shell would do the trick -- but the nature of the data >>>>>>>>> is >>>>>>>>> such that initially no expiration timestamp is set, and there is no >>>>>>>>> means >>>>>>>>> to update the key from the client app when expiration timestamp >>>>>>>>> finally >>>>>>>>> gets set (too much rework on that common tool I'm afraid). >>>>>>>>> >>>>>>>>> Thanks in advance. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
