Anything in your Tserver log? I think you should just rethrow that IOExcepton on your source's next() method, since they're usually not recoverable (ie, just make Counter#next throw IOException)
On Mon, Jul 14, 2014 at 5:48 PM, Josh Elser <[email protected]> wrote: > A quick sanity check is to make sure you have data in the table and that > you can read the data without your iterator (I've thought I had a bug > because I didn't have proper visibilities more times than I'd like to > admit). > > Alternatively, you can also enable remote-debugging via Eclipse into the > TabletServer which might help you understand more of what's going on. > > Lots of articles on how to set this up [1]. In short, add -Xdebug > -Xrunjdwp:transport=dt_socket,server=y,address=8000 to > ACCUMULO_TSERVER_OPTS in accumulo-env.sh, restart the tserver, connect > eclipse to 8000 via the Debug configuration menu, set a breakpoint in your > init, seek and next methods, and `scan` in the shell. > > > [1] http://javarevisited.blogspot.com/2011/02/how-to-setup- > remote-debugging-in.html > > > On 7/14/14, 5:33 PM, Michael Moss wrote: > >> Hmm...Still doesn't return anything from the shell. >> >> http://pastebin.com/ndRhspf8 >> >> Any thoughts? What's the best way to debug these? >> >> >> On Mon, Jul 14, 2014 at 5:14 PM, William Slacum >> <[email protected] <mailto:[email protected]>> >> >> wrote: >> >> Ah, an artifact of me just willy nilly writing an iterator :) Any >> reference to `this.source` should be replaced with >> `this.getSource()`. In `next()`, your workaround ends up calling >> `this.hasTop()` as the while loop condition. It will always return >> false because two lines up we set `top_key` to null. We need to make >> sure that the source iterator has a top, because we want to read >> data from it. We'll have to change the loop condition to >> `while(this.getSource().hasTop())`. On line 38 of your code we'll >> need to call `this.getSource().next()` instead of `this.next()`. >> >> The iterator interface is documented, but there hasn't been a >> definitive go-to for making one. I've been drafting a blog post, but >> since it doesn't exist yet, hopefully the following will suffice. >> >> The lifetime of an iterator is (usually) as follows: >> >> (1) A new instance is called via Class.newInstance (so a no-args >> constructor is needed) >> (2) Init is called. This allows users to configure the iterator, set >> its source, and possible check the environment. We can also call >> `deepCopy` on the source if we want to have multiple sources (we'd >> do this if we wanted to do a merge read out of multiple column >> families within a row). >> (3) seek() is called. This gets our readers to the correct positions >> in the data that are within the scan range the user requested, as >> well as turning column families on or off. The name should >> reminiscent of seeking to some key on disk. >> (4) hasTop() is called. If true, that means we have data, and the >> iterator has a key/value pair that can be retrieved by calling >> getTopKey() and getTopValue(). If fasle, we're done because there's >> no data to return. >> (5) next() is called. This will attempt find a new top key and >> value. We go back to (4) to see if next was successful in finding a >> new top key/value and will repeat until the client is satisfied or >> hasTop() returns false. >> >> You can kind of make a state machine out of those steps where we >> loop between (4) and (5) until there's no data. There are more >> advanced workflows where next() can be reading from multiple >> sources, as well as seeking them to different positions in the tablet. >> >> >> On Mon, Jul 14, 2014 at 4:51 PM, Michael Moss >> <[email protected] <mailto:[email protected]>> wrote: >> >> Thanks, William. I was just hitting you up for an example :) >> >> I adapted your pseudocode (http://pastebin.com/ufPJq0g3), but >> noticed that "this.source" in your example didn't have >> visibility. Did I worked around it correctly? >> >> When I add my iterator to my table and run scan from the shell, >> it returns nothing - what should I expect here? In general I've >> found the iterator interface pretty confusing and haven't spent >> the time wrapping my head around it yet. Any documentation or >> examples (beyond what I could find on the site or in the code) >> appreciated! >> >> /root@dev> table pojo/ >> /root@dev pojo> listiter -scan -t pojo/ >> /-/ >> /- Iterator counter, scan scope options:/ >> /- iteratorPriority = 10/ >> /- iteratorClassName = iterators.Counter/ >> /-/ >> /root@dev pojo> scan/ >> /root@dev pojo>/ >> >> >> Best, >> >> -Mike >> >> >> >> >> On Mon, Jul 14, 2014 at 4:07 PM, William Slacum >> <[email protected] >> <mailto:[email protected]>> wrote: >> >> For a bit of psuedocode, I'd probably make a class that did >> something akin to: http://pastebin.com/pKqAeeCR >> >> I wrote that up real quick in a text editor-- it won't >> compile or anything, but should point you in the right >> direction. >> >> >> On Mon, Jul 14, 2014 at 3:44 PM, William Slacum >> <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi Mike! >> >> The Combiner interface is only for aggregating keys >> within a single row. You can probably get away with >> implementing your combining logic in a WrappingIterator >> that reads across all the rows in a given tablet. >> >> To do some combine/fold/reduce operation, Accumulo needs >> the input type to be the same as the output type. The >> combiner doesn't have a notion of a "present" type (as >> you'd see in something like Algebird's Groups), but you >> can use another iterator to perform your transformation. >> >> If you wanted to extract the "count" field from your >> Avro object, you could write a new Iterator that took >> your Avro object, extracted the desired field, and >> returned it as its top value. You can then set this >> iterator as the source of the aggregator, either >> programmatically or via by wrapping the source object >> passed to the aggregator in its >> SortedKeyValueIterator#init call. >> >> This is a bit inefficient as you'd have to serialize to >> a Value and then immediately deserialize it in the >> iterator above it. You could mitigate this by exposing a >> method that would get the extracted value before >> serializing it. >> >> This kind of counting also requires client side logic to >> do a final combine operation, since the aggregations >> from all the tservers are partial results. >> >> I believe that CountingIterator is not meant for user >> consumption, but I do not know if it's related to your >> issue in trying to use it from the shell. Iterators set >> through the shell, in previous versions of Accumulo, >> have a requirement to implement OptionDescriber. Many >> default iterators do not implement this, and thus can't >> set in the shell. >> >> >> >> On Mon, Jul 14, 2014 at 2:44 PM, Michael Moss >> <[email protected] <mailto:[email protected]>> >> >> wrote: >> >> Hi, All. >> >> I'm curious what the best practices are around >> persisting complex types/data in Accumulo (and >> aggregating on fields within them). >> >> Let's say I have (row, column family, column >> qualifier, value): >> "A" "foo" "" MyHugeAvroObject(count=2) >> "A" "foo" "" MyHugeAvroObject(count=3) >> >> Let's say MyHugeAvroObject has a field "Integer >> count" with the values above. >> >> What is the best way to aggregate on row, column >> family, column qualifier by count? In my above >> example: >> "A" "foo" "" 5 >> >> The TypedValueCombiner.typedReduce method can >> deserialize any "V", in my case MyHugeAvroObject, >> but it needs to return a value of type "V". What are >> the best practices for deeply nested/complex >> objects? It's not always straightforward to map a >> complex Avro type into Row -> Column Family -> >> Column Qualifier. >> >> Rather than using a TypedCombiner, I looked into >> using an Aggregator (which appears deprecated as of >> 1.4), which appears to let me return arbitrary >> values, but despite running setiter, my aggregator >> doesn't seem to do anything. >> >> I also tried looking at implementing a >> WrappingIterator, which also appears to allow me to >> return arbitary values (such as Accumulo's >> CountingIterator), but I get cryptic errors when >> trying to setiter, I'm on Accumulo 1.6: >> >> root@dev kyt> setiter -t kyt -scan -p 10 -n >> countingIter -class >> org.apache.accumulo.core.iterators.system. >> CountingIterator >> 2014-07-14 11:12:55,623 [shell.Shell] ERROR: >> java.lang.IllegalArgumentException: >> org.apache.accumulo.core.iterators.system. >> CountingIterator >> >> This is odd because other included implementations >> of WrappingIterator seem to work (perhaps the >> implementation of CountingIterator is dated): >> root@dev kyt> setiter -t kyt -scan -p 10 -n >> deletingIterator -class >> org.apache.accumulo.core.iterators.system. >> DeletingIterator >> The iterator class does not implement >> OptionDescriber. Consider this for better iterator >> configuration using this setiter command. >> Name for iterator (enter to skip): >> >> All in all, how can I aggregate simple values, like >> counters from rows with complex Avro objects as >> Values without having to add aggregations fields to >> these Value objects? >> >> Thanks! >> >> -Mike >> >> >> >> >> >> >>
