Josh,
Thanks for getting back to me so quickly.  I explained in my lengthy reply to 
William that the comment on OrIterator.TermSource.compareTo indicates that 
implementations with more than one row per tablet need to compare row key first 
(and that is not being done in this code).  It may be that it's not an issue 
and I'm simply misunderstanding something.  As for the wikisearch example, as I 
understood it, it could only handle searches for "anded" terms.  If that's not 
the case, then an example of an or search would be helpful.  In any case, I'd 
love a deeper dive on the wikisearch somewhere.  I get the source code and a 
high level explanation of what's happening, but I'd love a tutorial or 
something that walks through the classes and explains how each one contributes 
to the functionality.  Don't consider that a request (that would be a lot more 
to ask then I'm willing to ask), but I would certainly find it useful if it 
does exist.

Thanks,
Tejay

From: Josh Elser [mailto:[email protected]]
Sent: Wednesday, August 22, 2012 2:53 PM
To: [email protected]
Subject: EXTERNAL: Re: Custom Iterators

What makes you say that the OrIterator cannot handle more than one row per 
tablet? Can you provide details?

AFAIK, the OrIterator should work correctly in all cases (e.g. regardless of 
row distribution in a tablet). Any issues in the code that prevent it from 
doing so would be a bug that should be fixed.

Also, the wikisearch example supports indexing over multiple attributes (and I 
believe indexes document metadata in addition to the tokenized document). Is 
there something unclear that could be better documented?
On 8/22/12 4:41 PM, Cardon, Tejay E wrote:
All,
I'm interested in writing a custom iterator, and I've been looking for 
documentation on how to do so.  Thus far, I've not been able to find anything 
beyond the java docs in SortedKeyValueIterator and a few other sub-classes.  A 
few of the examples use Iterators, but provide no real info on how to properly 
implement one.  Is there anywhere to find general guidance on the iterator 
stack?

(If you're interested)
Specifically, for those that are curious, I'm trying to implement something 
similar to the wikisearch example, but with some key differences.  In my case, 
I've got a file with various attributes that being indexed.  So for each file 
there are 5 attributes, and each attribute has a fixed number of possible 
values.  For example (totally made up):
personID, gender, hair color, country, race, personRecord

Row:binID; ColFam:Attribute_AttributeValue; ColQ:PersonID; Val:blank
AND
Row:binID; ColFam:"D"; ColQ:personID; value:personRecord

A typical query would be:
Give me the personRecord for all people with:
Gender: male &
Hair color: blond or brown &
Country: USA or England or china or korea &
Race: white or oriental

The existing Iterators used in the wikisearch example are unable to handle the 
"or" clauses in each attribute.
The OrIterator doesn't appear to handle the possibility more than one row per 
tablet

Thanks,
Tejay Cardon

Reply via email to