Thanks for catching that! I did indeed write that down incorrectly. I apologize. I'll fix that tonight.
Iterators are stacked based on their priority ( when you set them via the scanner, for example ) or the input format's IteratorSetting. The init method comment is a general suggestion, for example if you are using it within a scan session. The OrIterator ( as in the wikisearch example ) is created by the BooleanLogicIterator, and the sources are added ( through the addTerm method). This is, apparently, it's expected use. You will also note that the BooleanLogicIterator ( or any iter that uses the OrIterator ) has an implemented initializer method. On Thu, Aug 23, 2012 at 10:59 AM, Cardon, Tejay E <[email protected]> wrote: > Marc, > > Thanks for the writeup. It is by far the most comprehensive info I’ve seen > on iterators, and was very helpful to me. A couple notes/questions: > > > > You mention that SortedKeyValueIterator implements FileSKVIterator. I’ve > only looked at the 1.4.1 source, but it appears that the opposite is true. > > > > You also mention that iterators get their source from the init method, but > some (like OrIterator) seem to throw exceptions on that method. Where do > they get their source data, and what are the API implications of having > iterators that reject init (or deep copy for that matter). > > > > Final thought. If I want to stack several iterators, what’s the best way to > go about that? In other words, I’d like an iterator that I write to be the > source to another iterator that I’ve written, which in turn may feed yet > another that I’ve written. Preferably, I’d like each to be independently > re-useable, so I don’t want to build that stacking into the source of any of > the iterators themselves. Is that possible, or would I need some sort of > iterator factory that builds the stacks and then acts as an interface to the > fully formed stack? > > > > Thanks, > > Tejay > > From: Marc Parisi [mailto:[email protected]] > Sent: Wednesday, August 22, 2012 5:33 PM > > > To: [email protected] > Subject: EXTERNAL: Re: Custom Iterators > > > > Here's a quick write up > > > > http://www.accumulo.net/node/1 > > On Wed, Aug 22, 2012 at 8:03 PM, Josh Elser <[email protected]> wrote: > > Err, double (triple) reply: > > No, you are incorrect. The wikisearch example can handle any arbitrary > boolean expression containing NOT, AND, and OR. As always, I'll preface it > the same as Bill did: it *should* be able to handle them :). > > I know that cleaning-up/reworking the Wikisearch code is in the works. I'm > just not positive about the timeframe. > > As far as examples, I'd push you to the write-up Eric did after benchmarking > the wikisearch example: http://accumulo.apache.org/example/wikisearch.html > > He has some example queries that give the basic idea behind what's supported > (minus the NOTs) > > On 08/22/2012 05:27 PM, Cardon, Tejay E wrote: > > > Josh, > > Thanks for getting back to me so quickly. I explained in my lengthy reply to > William that the comment on OrIterator.TermSource.compareTo indicates that > implementations with more than one row per tablet need to compare row key > first (and that is not being done in this code). It may be that it’s not an > issue and I’m simply misunderstanding something. As for the wikisearch > example, as I understood it, it could only handle searches for “anded” > terms. If that’s not the case, then an example of an or search would be > helpful. In any case, I’d love a deeper dive on the wikisearch somewhere. I > get the source code and a high level explanation of what’s happening, but > I’d love a tutorial or something that walks through the classes and explains > how each one contributes to the functionality. Don’t consider that a request > (that would be a lot more to ask then I’m willing to ask), but I would > certainly find it useful if it does exist. > > Thanks, > > Tejay > > *From:*Josh Elser [mailto:[email protected]] > *Sent:* Wednesday, August 22, 2012 2:53 PM > *To:* [email protected] > *Subject:* EXTERNAL: Re: Custom Iterators > > > > What makes you say that the OrIterator cannot handle more than one row per > tablet? Can you provide details? > > AFAIK, the OrIterator should work correctly in all cases (e.g. regardless of > row distribution in a tablet). Any issues in the code that prevent it from > doing so would be a bug that should be fixed. > > Also, the wikisearch example supports indexing over multiple attributes (and > I believe indexes document metadata in addition to the tokenized document). > Is there something unclear that could be better documented? > > On 8/22/12 4:41 PM, Cardon, Tejay E wrote: > > All, > > I’m interested in writing a custom iterator, and I’ve been looking > for documentation on how to do so. Thus far, I’ve not been able to > find anything beyond the java docs in SortedKeyValueIterator and a > few other sub-classes. A few of the examples use Iterators, but > provide no real info on how to properly implement one. Is there > anywhere to find general guidance on the iterator stack? > > (If you’re interested) > > Specifically, for those that are curious, I’m trying to implement > something similar to the wikisearch example, but with some key > differences. In my case, I’ve got a file with various attributes > that being indexed. So for each file there are 5 attributes, and > each attribute has a fixed number of possible values. For example > (totally made up): > > personID, gender, hair color, country, race, personRecord > > Row:binID; ColFam:Attribute_AttributeValue; ColQ:PersonID; Val:blank > > AND > Row:binID; ColFam:”D”; ColQ:personID; value:personRecord > > A typical query would be: > > Give me the personRecord for all people with: > > Gender: male & > > Hair color: blond or brown & > > Country: USA or England or china or korea & > > Race: white or oriental > > The existing Iterators used in the wikisearch example are unable > to handle the “or” clauses in each attribute. > > The OrIterator doesn’t appear to handle the possibility more than > one row per tablet > > Thanks, > > Tejay Cardon > >
