Thanks, William. I was just hitting you up for an example :) I adapted your pseudocode (http://pastebin.com/ufPJq0g3), but noticed that "this.source" in your example didn't have visibility. Did I worked around it correctly?
When I add my iterator to my table and run scan from the shell, it returns nothing - what should I expect here? In general I've found the iterator interface pretty confusing and haven't spent the time wrapping my head around it yet. Any documentation or examples (beyond what I could find on the site or in the code) appreciated! *root@dev> table pojo* *root@dev pojo> listiter -scan -t pojo* *-* *- Iterator counter, scan scope options:* *- iteratorPriority = 10* *- iteratorClassName = iterators.Counter* *-* *root@dev pojo> scan* *root@dev pojo>* Best, -Mike On Mon, Jul 14, 2014 at 4:07 PM, William Slacum < wilhelm.von.cl...@accumulo.net> wrote: > For a bit of psuedocode, I'd probably make a class that did something akin > to: http://pastebin.com/pKqAeeCR > > I wrote that up real quick in a text editor-- it won't compile or > anything, but should point you in the right direction. > > > On Mon, Jul 14, 2014 at 3:44 PM, William Slacum < > wilhelm.von.cl...@accumulo.net> wrote: > >> Hi Mike! >> >> The Combiner interface is only for aggregating keys within a single row. >> You can probably get away with implementing your combining logic in a >> WrappingIterator that reads across all the rows in a given tablet. >> >> To do some combine/fold/reduce operation, Accumulo needs the input type >> to be the same as the output type. The combiner doesn't have a notion of a >> "present" type (as you'd see in something like Algebird's Groups), but you >> can use another iterator to perform your transformation. >> >> If you wanted to extract the "count" field from your Avro object, you >> could write a new Iterator that took your Avro object, extracted the >> desired field, and returned it as its top value. You can then set this >> iterator as the source of the aggregator, either programmatically or via by >> wrapping the source object passed to the aggregator in its >> SortedKeyValueIterator#init call. >> >> This is a bit inefficient as you'd have to serialize to a Value and then >> immediately deserialize it in the iterator above it. You could mitigate >> this by exposing a method that would get the extracted value before >> serializing it. >> >> This kind of counting also requires client side logic to do a final >> combine operation, since the aggregations from all the tservers are partial >> results. >> >> I believe that CountingIterator is not meant for user consumption, but I >> do not know if it's related to your issue in trying to use it from the >> shell. Iterators set through the shell, in previous versions of Accumulo, >> have a requirement to implement OptionDescriber. Many default iterators do >> not implement this, and thus can't set in the shell. >> >> >> >> On Mon, Jul 14, 2014 at 2:44 PM, Michael Moss <michael.m...@gmail.com> >> wrote: >> >>> Hi, All. >>> >>> I'm curious what the best practices are around persisting complex >>> types/data in Accumulo (and aggregating on fields within them). >>> >>> Let's say I have (row, column family, column qualifier, value): >>> "A" "foo" "" MyHugeAvroObject(count=2) >>> "A" "foo" "" MyHugeAvroObject(count=3) >>> >>> Let's say MyHugeAvroObject has a field "Integer count" with the values >>> above. >>> >>> What is the best way to aggregate on row, column family, column >>> qualifier by count? In my above example: >>> "A" "foo" "" 5 >>> >>> The TypedValueCombiner.typedReduce method can deserialize any "V", in my >>> case MyHugeAvroObject, but it needs to return a value of type "V". What are >>> the best practices for deeply nested/complex objects? It's not always >>> straightforward to map a complex Avro type into Row -> Column Family -> >>> Column Qualifier. >>> >>> Rather than using a TypedCombiner, I looked into using an Aggregator >>> (which appears deprecated as of 1.4), which appears to let me return >>> arbitrary values, but despite running setiter, my aggregator doesn't seem >>> to do anything. >>> >>> I also tried looking at implementing a WrappingIterator, which also >>> appears to allow me to return arbitary values (such as Accumulo's >>> CountingIterator), but I get cryptic errors when trying to setiter, I'm on >>> Accumulo 1.6: >>> >>> root@dev kyt> setiter -t kyt -scan -p 10 -n countingIter -class >>> org.apache.accumulo.core.iterators.system.CountingIterator >>> 2014-07-14 11:12:55,623 [shell.Shell] ERROR: >>> java.lang.IllegalArgumentException: >>> org.apache.accumulo.core.iterators.system.CountingIterator >>> >>> This is odd because other included implementations of WrappingIterator >>> seem to work (perhaps the implementation of CountingIterator is dated): >>> root@dev kyt> setiter -t kyt -scan -p 10 -n deletingIterator -class >>> org.apache.accumulo.core.iterators.system.DeletingIterator >>> The iterator class does not implement OptionDescriber. Consider this for >>> better iterator configuration using this setiter command. >>> Name for iterator (enter to skip): >>> >>> All in all, how can I aggregate simple values, like counters from rows >>> with complex Avro objects as Values without having to add aggregations >>> fields to these Value objects? >>> >>> Thanks! >>> >>> -Mike >>> >> >> >