On Thu, Nov 7, 2013 at 3:49 PM, Terry P. <[email protected]> wrote: > Hi Keith, > No, expTs won't be the first actually -- that'll teach me to try things > with overly simplistic data! >
> There will be 10-12 column families for each row. I take it my simple > check for column family name isn't enough? > You can iterate until you see the column or seek to it. If you expect there will always be a small of data before the column occurs, then iterate. > > > On Thursday, November 7, 2013, Keith Turner wrote: > >> Your accept row function assumes that expTs will be the first column in >> the row, is this always the case? >> >> >> On Wed, Nov 6, 2013 at 3:37 PM, Terry P. <[email protected]> wrote: >> >> Hi William, many thanks for the explanation of scan time versus >> compaction time. I'll look through the classes again and note where the >> remove versus suppress wordings are used and open a ticket. >> >> As mentioned, I only dabble in java, but regardless of that fact at this >> point I'm the one that has to get this done. I've hobbled together my first >> attempt, but I get the following error where I try to add it as a scan >> iterator for testing: >> >> root@meta> setiter -class >> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n expTsFilter -p >> 20 -scan -t itertest >> 2013-11-06 14:06:34,914 [shell.Shell] ERROR: >> org.apache.accumulo.core.util.shell.ShellCommandException: Command could >> not be initialized (Servers are unable to load >> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter as type >> org.apache.accumulo.core.iterators.SortedKeyValueIterator) >> >> Here's my source. Note that the value stored in the expTs ColFam is in >> the format "yyyyMMddHHmmssS", which I convert to a long for a direct >> comparison to System.currentTimeMillis(). I only overrode the init and >> acceptRow methods, hoping the others would work as-is from the base class. >> >> One clarification: turns out expTs is the ColumnFamily, and the ingest >> app does not assign a ColumnQualifier for expTs. So to amend my prior table >> layout (including the datetime format): >> >> >> Format: Key:CF:CQ:Value >> abc:data:title:"My fantastic data" >> abc:data:content:<bytedata> >> abc:creTs::20130804171412445 >> abc:*expTs*::20131104171412445 >> ... 6-8 more columns of data per row ... >> >> where *expTs* is the ColumnFamily to determine if the entire row should >> be removed based on whether its value is <= NOW. If a row has not yet been >> assigned an expiration date, expTs will not be set and the ColumnFamily >> will not yet be present. Seems like an odd choice to use distinct Column >> Families, without Column Qualifiers, but that's how the ingest app was done. >> >> I greatly appreciate any advice you can provide. >> >> package com.esa.accumulo.iterators; >> >> import java.io.IOException; >> import java.text.ParseException; >> import java.text.SimpleDateFormat; >> import java.util.Date; >> import java.util.Map; >> >> import org.apache.accumulo.core.data.Key; >> import org.apache.accumulo.core.data.Value; >> import org.apache.accumulo.core.iterators.IteratorEnvironment; >> import org.apache.accumulo.core.iterators.SortedKeyValueIterator; >> import org.apache.accumulo.core.iterators.user.RowFilter; >> >> /** >> * A filter that removes rows based on the column designated as the >> "expiration timestamp" column family. >> * >> * It removes the row if the value in the expirationTimestamp column is >> less than currentTime. >> * >> * TODO: The designation of the expirationTimestamp ColumnFamily and its >> DateFormat is >> * set in the iterator options when the iterator is applied to the table. >> (For >> * now it is hardcoded to match the format used in the Solr-Accumulo >> plugin) >> */ >> public class ExpirationTimestampPurgeFilter extends RowFilter { >> private long currentTime; >> // TODO: make accumuloDateFormat settable via Iterator Options >> // Date Format for Expiration Timestamp ColumnFamily stored in Accumulo >> private String expTsDateFormat = "yyyyMMddHHmmssS"; >> SimpleDateFormat df = new SimpleDateFormat(expTsDateFormat); >> >> // TODO: make expTs settable via Iterator Options >> // ColumnFamily containing Expiration Timestamp value (note ingest app >> // did NOT assign a ColumnQualifier, only a ColumnFamily) >> private String expTsColFam = "expTs"; >> >> @Override >> public boolean acceptRow(SortedKeyValueIterator<Key, Value> rowIterator) >> throws IOException { >> >> if >> (rowIterator.getTopKey().getColumnFamily().toString().equals(expTsColFam)) { >> Date expTsDate = null; >> try { >> expTsDate = df.parse(rowIterator.getTopValue().toString()); >> if (expTsDate.getTime() < currentTime) >> return false; >> } catch (ParseException e) { >> // TODO Auto-generated catch block >> e.printStackTrace(); >> } >> } >> return true; >> } >> >> @Override >> public void init(SortedKeyValueIterator<Key, Value> source, >> Map<String, Str >> >>
