For a bit of psuedocode, I'd probably make a class that did something akin to: http://pastebin.com/pKqAeeCR
I wrote that up real quick in a text editor-- it won't compile or anything, but should point you in the right direction. On Mon, Jul 14, 2014 at 3:44 PM, William Slacum < [email protected]> wrote: > Hi Mike! > > The Combiner interface is only for aggregating keys within a single row. > You can probably get away with implementing your combining logic in a > WrappingIterator that reads across all the rows in a given tablet. > > To do some combine/fold/reduce operation, Accumulo needs the input type to > be the same as the output type. The combiner doesn't have a notion of a > "present" type (as you'd see in something like Algebird's Groups), but you > can use another iterator to perform your transformation. > > If you wanted to extract the "count" field from your Avro object, you > could write a new Iterator that took your Avro object, extracted the > desired field, and returned it as its top value. You can then set this > iterator as the source of the aggregator, either programmatically or via by > wrapping the source object passed to the aggregator in its > SortedKeyValueIterator#init call. > > This is a bit inefficient as you'd have to serialize to a Value and then > immediately deserialize it in the iterator above it. You could mitigate > this by exposing a method that would get the extracted value before > serializing it. > > This kind of counting also requires client side logic to do a final > combine operation, since the aggregations from all the tservers are partial > results. > > I believe that CountingIterator is not meant for user consumption, but I > do not know if it's related to your issue in trying to use it from the > shell. Iterators set through the shell, in previous versions of Accumulo, > have a requirement to implement OptionDescriber. Many default iterators do > not implement this, and thus can't set in the shell. > > > > On Mon, Jul 14, 2014 at 2:44 PM, Michael Moss <[email protected]> > wrote: > >> Hi, All. >> >> I'm curious what the best practices are around persisting complex >> types/data in Accumulo (and aggregating on fields within them). >> >> Let's say I have (row, column family, column qualifier, value): >> "A" "foo" "" MyHugeAvroObject(count=2) >> "A" "foo" "" MyHugeAvroObject(count=3) >> >> Let's say MyHugeAvroObject has a field "Integer count" with the values >> above. >> >> What is the best way to aggregate on row, column family, column qualifier >> by count? In my above example: >> "A" "foo" "" 5 >> >> The TypedValueCombiner.typedReduce method can deserialize any "V", in my >> case MyHugeAvroObject, but it needs to return a value of type "V". What are >> the best practices for deeply nested/complex objects? It's not always >> straightforward to map a complex Avro type into Row -> Column Family -> >> Column Qualifier. >> >> Rather than using a TypedCombiner, I looked into using an Aggregator >> (which appears deprecated as of 1.4), which appears to let me return >> arbitrary values, but despite running setiter, my aggregator doesn't seem >> to do anything. >> >> I also tried looking at implementing a WrappingIterator, which also >> appears to allow me to return arbitary values (such as Accumulo's >> CountingIterator), but I get cryptic errors when trying to setiter, I'm on >> Accumulo 1.6: >> >> root@dev kyt> setiter -t kyt -scan -p 10 -n countingIter -class >> org.apache.accumulo.core.iterators.system.CountingIterator >> 2014-07-14 11:12:55,623 [shell.Shell] ERROR: >> java.lang.IllegalArgumentException: >> org.apache.accumulo.core.iterators.system.CountingIterator >> >> This is odd because other included implementations of WrappingIterator >> seem to work (perhaps the implementation of CountingIterator is dated): >> root@dev kyt> setiter -t kyt -scan -p 10 -n deletingIterator -class >> org.apache.accumulo.core.iterators.system.DeletingIterator >> The iterator class does not implement OptionDescriber. Consider this for >> better iterator configuration using this setiter command. >> Name for iterator (enter to skip): >> >> All in all, how can I aggregate simple values, like counters from rows >> with complex Avro objects as Values without having to add aggregations >> fields to these Value objects? >> >> Thanks! >> >> -Mike >> > >
