On May 17, 2006, at 2:18 PM, Chris Hostetter wrote:
Off the top of my head, i can't think of any cool way Solr cna help you with this. My best bet on how to solve the problem in general would be an analyzer that doesn't to any tokenizing, but does create multiple tokens
at the same position after "rotating" all of the words, ie for the
inputtext...

        "Dante Gabriel Rossetti"

create the following tokens, all at the same position...

        "Dante Gabriel Rossetti"
        "Gabriel Rossetti, Dante"
        "Rossetti, Dante Gabriel"

...and then keep using TermEnum.

That's a pretty clever solution, actually. However, the one negative it has is case sensitivity. I could lowercase everything, but then the terms the user sees will be in all lowercase and that simply won't do for my scholarly audience :)

I suppose the general way to do this accurately in a rough fashion would be to tokenize (StandardAnalyzer would be sufficient) the agent field, then during TermEnum'ing look up all the Documents for that term, grab the "agent" field, and display that - except the fiddly bit about agent being multivalued.

It seems like what I really need is simply a separate index (or rather a partition of the main Solr one) where a Document represents an "agent", and do a PrefixQuery or TermEnum and get all unique agents.

But partitioning the Solr index into various document types at this point seems overkill - though it is an area I'd like to explore.

Maybe I need to build some sort of term -> agent cache during warming that makes this a no brainer?

Thanks for all the excellent feedback.

And, I was successful in producing an auto-suggest lookup using the non-tokenized case sensitive terms and performance was quite fine - again this is a Ruby on Rails front-end hitting Solr, with the only caching occurring on the Solr side of things currently.

        Erik

Reply via email to