Billie, I think you've got it. Now I need to write it.

On Thu, Jul 12, 2012 at 11:47 AM, Billie J Rinaldi
<[email protected]> wrote:
> On Thursday, July 12, 2012 8:47:41 AM, "David Medinets" 
> <[email protected]> wrote:
>> I'd like to track field level changes for a given record (say,
>> author). So I create a table without a VersioningIterator. And I
>> insert a few records:
>>
>> insert "JOHN" "ATTRIBUTE" "AGE" "34"
>> insert "JOHN" "ATTRIBUTE" "HEIGHT" "67"
>> insert "JOHN" "BOOKS" "TITLE" "THE RISE OF ACCUMULO"
>>
>> The next action is that some ingest process happens and does this:
>>
>> insert "JOHN" "ATTRIBUTE" "AGE" "34"
>>
>> Since there is no VersioningIterator, there are two AGES both with
>> "34" as the value.
>>
>> I would like an DropUnchangedValueIterator which removes the last
>> inserted record. Removing the last record lets me use the n-1
>> timestamp as a LastUpdated value for the key-value pair. But as soon
>> as a record is deleted, the previous records are not available
>> anymore? What if the timestamp is set to MAX-timestamp so the records
>> are sorted backwards? Does that avoid the blocking tombstones? I'd
>> look at the source code before asking but I don't have that luxury for
>> the next week or two and the question is rattling around my head.
>
> This is mixing the idea of a deletion entry, which removes all earlier 
> entries, and the the idea that iterators can arbitrarily filter out entries.  
> I don't think reversing the timestamp will help you much in this case; what 
> you want is an iterator that does pairwise comparisons of entries, and if the 
> values are the same keep one entry with the earlier timestamp (then keep 
> comparing entries for that record), and if the values are different keep one 
> entry with the later timestamp (then skip to the next record).  I think 
> you'll have to write a custom iterator for that.
>
> Billie
>
>
>> Naturally, I could query the database before the ingest insert. But,
>> referring to slide 19 in Adam's presentation at
>> http://people.apache.org/~afuchs/slides/accumulo_table_design.pdf, the
>> read-modify-write design is not optimal.

Reply via email to