Here's a quick write up
http://www.accumulo.net/node/1 <http://accumulo.net/node/1>
On Wed, Aug 22, 2012 at 8:03 PM, Josh Elser <[email protected]> wrote:
> Err, double (triple) reply:
>
> No, you are incorrect. The wikisearch example can handle any arbitrary
> boolean expression containing NOT, AND, and OR. As always, I'll preface it
> the same as Bill did: it *should* be able to handle them :).
>
> I know that cleaning-up/reworking the Wikisearch code is in the works. I'm
> just not positive about the timeframe.
>
> As far as examples, I'd push you to the write-up Eric did after
> benchmarking the wikisearch example: http://accumulo.apache.org/**
> example/wikisearch.html<http://accumulo.apache.org/example/wikisearch.html>
>
> He has some example queries that give the basic idea behind what's
> supported (minus the NOTs)
>
> On 08/22/2012 05:27 PM, Cardon, Tejay E wrote:
>
>>
>> Josh,
>>
>> Thanks for getting back to me so quickly. I explained in my lengthy reply
>> to William that the comment on OrIterator.TermSource.**compareTo
>> indicates that implementations with more than one row per tablet need to
>> compare row key first (and that is not being done in this code). It may be
>> that it’s not an issue and I’m simply misunderstanding something. As for
>> the wikisearch example, as I understood it, it could only handle searches
>> for “anded” terms. If that’s not the case, then an example of an or search
>> would be helpful. In any case, I’d love a deeper dive on the wikisearch
>> somewhere. I get the source code and a high level explanation of what’s
>> happening, but I’d love a tutorial or something that walks through the
>> classes and explains how each one contributes to the functionality. Don’t
>> consider that a request (that would be a lot more to ask then I’m willing
>> to ask), but I would certainly find it useful if it does exist.
>>
>> Thanks,
>>
>> Tejay
>>
>> *From:*Josh Elser [mailto:[email protected]]
>> *Sent:* Wednesday, August 22, 2012 2:53 PM
>> *To:* [email protected]
>> *Subject:* EXTERNAL: Re: Custom Iterators
>>
>>
>> What makes you say that the OrIterator cannot handle more than one row
>> per tablet? Can you provide details?
>>
>> AFAIK, the OrIterator should work correctly in all cases (e.g. regardless
>> of row distribution in a tablet). Any issues in the code that prevent it
>> from doing so would be a bug that should be fixed.
>>
>> Also, the wikisearch example supports indexing over multiple attributes
>> (and I believe indexes document metadata in addition to the tokenized
>> document). Is there something unclear that could be better documented?
>>
>> On 8/22/12 4:41 PM, Cardon, Tejay E wrote:
>>
>> All,
>>
>> I’m interested in writing a custom iterator, and I’ve been looking
>> for documentation on how to do so. Thus far, I’ve not been able to
>> find anything beyond the java docs in SortedKeyValueIterator and a
>> few other sub-classes. A few of the examples use Iterators, but
>> provide no real info on how to properly implement one. Is there
>> anywhere to find general guidance on the iterator stack?
>>
>> (If you’re interested)
>>
>> Specifically, for those that are curious, I’m trying to implement
>> something similar to the wikisearch example, but with some key
>> differences. In my case, I’ve got a file with various attributes
>> that being indexed. So for each file there are 5 attributes, and
>> each attribute has a fixed number of possible values. For example
>> (totally made up):
>>
>> personID, gender, hair color, country, race, personRecord
>>
>> Row:binID; ColFam:Attribute_**AttributeValue; ColQ:PersonID;
>> Val:blank
>>
>> AND
>> Row:binID; ColFam:”D”; ColQ:personID; value:personRecord
>>
>> A typical query would be:
>>
>> Give me the personRecord for all people with:
>>
>> Gender: male &
>>
>> Hair color: blond or brown &
>>
>> Country: USA or England or china or korea &
>>
>> Race: white or oriental
>>
>> The existing Iterators used in the wikisearch example are unable
>> to handle the “or” clauses in each attribute.
>>
>> The OrIterator doesn’t appear to handle the possibility more than
>> one row per tablet
>>
>> Thanks,
>>
>> Tejay Cardon
>>
>>