Err, double (triple) reply:
No, you are incorrect. The wikisearch example can handle any arbitrary
boolean expression containing NOT, AND, and OR. As always, I'll preface
it the same as Bill did: it *should* be able to handle them :).
I know that cleaning-up/reworking the Wikisearch code is in the works.
I'm just not positive about the timeframe.
As far as examples, I'd push you to the write-up Eric did after
benchmarking the wikisearch example:
http://accumulo.apache.org/example/wikisearch.html
He has some example queries that give the basic idea behind what's
supported (minus the NOTs)
On 08/22/2012 05:27 PM, Cardon, Tejay E wrote:
Josh,
Thanks for getting back to me so quickly. I explained in my lengthy
reply to William that the comment on OrIterator.TermSource.compareTo
indicates that implementations with more than one row per tablet need
to compare row key first (and that is not being done in this code). It
may be that it’s not an issue and I’m simply misunderstanding
something. As for the wikisearch example, as I understood it, it could
only handle searches for “anded” terms. If that’s not the case, then
an example of an or search would be helpful. In any case, I’d love a
deeper dive on the wikisearch somewhere. I get the source code and a
high level explanation of what’s happening, but I’d love a tutorial or
something that walks through the classes and explains how each one
contributes to the functionality. Don’t consider that a request (that
would be a lot more to ask then I’m willing to ask), but I would
certainly find it useful if it does exist.
Thanks,
Tejay
*From:*Josh Elser [mailto:[email protected]]
*Sent:* Wednesday, August 22, 2012 2:53 PM
*To:* [email protected]
*Subject:* EXTERNAL: Re: Custom Iterators
What makes you say that the OrIterator cannot handle more than one row
per tablet? Can you provide details?
AFAIK, the OrIterator should work correctly in all cases (e.g.
regardless of row distribution in a tablet). Any issues in the code
that prevent it from doing so would be a bug that should be fixed.
Also, the wikisearch example supports indexing over multiple
attributes (and I believe indexes document metadata in addition to the
tokenized document). Is there something unclear that could be better
documented?
On 8/22/12 4:41 PM, Cardon, Tejay E wrote:
All,
I’m interested in writing a custom iterator, and I’ve been looking
for documentation on how to do so. Thus far, I’ve not been able to
find anything beyond the java docs in SortedKeyValueIterator and a
few other sub-classes. A few of the examples use Iterators, but
provide no real info on how to properly implement one. Is there
anywhere to find general guidance on the iterator stack?
(If you’re interested)
Specifically, for those that are curious, I’m trying to implement
something similar to the wikisearch example, but with some key
differences. In my case, I’ve got a file with various attributes
that being indexed. So for each file there are 5 attributes, and
each attribute has a fixed number of possible values. For example
(totally made up):
personID, gender, hair color, country, race, personRecord
Row:binID; ColFam:Attribute_AttributeValue; ColQ:PersonID; Val:blank
AND
Row:binID; ColFam:”D”; ColQ:personID; value:personRecord
A typical query would be:
Give me the personRecord for all people with:
Gender: male &
Hair color: blond or brown &
Country: USA or England or china or korea &
Race: white or oriental
The existing Iterators used in the wikisearch example are unable
to handle the “or” clauses in each attribute.
The OrIterator doesn’t appear to handle the possibility more than
one row per tablet
Thanks,
Tejay Cardon