Hi Keith, No, expTs won't be the first actually -- that'll teach me to try things with overly simplistic data!
There will be 10-12 column families for each row. I take it my simple check for column family name isn't enough? On Thursday, November 7, 2013, Keith Turner wrote: > Your accept row function assumes that expTs will be the first column in > the row, is this always the case? > > > On Wed, Nov 6, 2013 at 3:37 PM, Terry P. <[email protected]> wrote: > > Hi William, many thanks for the explanation of scan time versus compaction > time. I'll look through the classes again and note where the remove versus > suppress wordings are used and open a ticket. > > As mentioned, I only dabble in java, but regardless of that fact at this > point I'm the one that has to get this done. I've hobbled together my first > attempt, but I get the following error where I try to add it as a scan > iterator for testing: > > root@meta> setiter -class > org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n expTsFilter -p > 20 -scan -t itertest > 2013-11-06 14:06:34,914 [shell.Shell] ERROR: > org.apache.accumulo.core.util.shell.ShellCommandException: Command could > not be initialized (Servers are unable to load > org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter as type > org.apache.accumulo.core.iterators.SortedKeyValueIterator) > > Here's my source. Note that the value stored in the expTs ColFam is in > the format "yyyyMMddHHmmssS", which I convert to a long for a direct > comparison to System.currentTimeMillis(). I only overrode the init and > acceptRow methods, hoping the others would work as-is from the base class. > > One clarification: turns out expTs is the ColumnFamily, and the ingest app > does not assign a ColumnQualifier for expTs. So to amend my prior table > layout (including the datetime format): > > > Format: Key:CF:CQ:Value > abc:data:title:"My fantastic data" > abc:data:content:<bytedata> > abc:creTs::20130804171412445 > abc:*expTs*::20131104171412445 > ... 6-8 more columns of data per row ... > > where *expTs* is the ColumnFamily to determine if the entire row should > be removed based on whether its value is <= NOW. If a row has not yet been > assigned an expiration date, expTs will not be set and the ColumnFamily > will not yet be present. Seems like an odd choice to use distinct Column > Families, without Column Qualifiers, but that's how the ingest app was done. > > I greatly appreciate any advice you can provide. > > package com.esa.accumulo.iterators; > > import java.io.IOException; > import java.text.ParseException; > import java.text.SimpleDateFormat; > import java.util.Date; > import java.util.Map; > > import org.apache.accumulo.core.data.Key; > import org.apache.accumulo.core.data.Value; > import org.apache.accumulo.core.iterators.IteratorEnvironment; > import org.apache.accumulo.core.iterators.SortedKeyValueIterator; > import org.apache.accumulo.core.iterators.user.RowFilter; > > /** > * A filter that removes rows based on the column designated as the > "expiration timestamp" column family. > * > * It removes the row if the value in the expirationTimestamp column is > less than currentTime. > * > * TODO: The designation of the expirationTimestamp ColumnFamily and its > DateFormat is > * set in the iterator options when the iterator is applied to the table. > (For > * now it is hardcoded to match the format used in the Solr-Accumulo > plugin) > */ > public class ExpirationTimestampPurgeFilter extends RowFilter { > private long currentTime; > // TODO: make accumuloDateFormat settable via Iterator Options > // Date Format for Expiration Timestamp ColumnFamily stored in Accumulo > private String expTsDateFormat = "yyyyMMddHHmmssS"; > SimpleDateFormat df = new SimpleDateFormat(expTsDateFormat); > > // TODO: make expTs settable via Iterator Options > // ColumnFamily containing Expiration Timestamp value (note ingest app > // did NOT assign a ColumnQualifier, only a ColumnFamily) > private String expTsColFam = "expTs"; > > @Override > public boolean acceptRow(SortedKeyValueIterator<Key, Value> rowIterator) > throws IOException { > > if > (rowIterator.getTopKey().getColumnFamily().toString().equals(expTsColFam)) { > Date expTsDate = null; > try { > expTsDate = df.parse(rowIterator.getTopValue().toString()); > if (expTsDate.getTime() < currentTime) > return false; > } catch (ParseException e) { > // TODO Auto-generated catch block > e.printStackTrace(); > } > } > return true; > } > > @Override > public void init(SortedKeyValueIterator<Key, Value> source, > Map<String, Str > >
