Re: solr-suggestion - terms that "start with"...

Erik Hatcher Wed, 17 May 2006 17:21:33 -0700


On May 17, 2006, at 2:18 PM, Chris Hostetter wrote:

Off the top of my head, i can't think of any cool way Solr cna helpyouwith this. My best bet on how to solve the problem in generalwould be ananalyzer that doesn't to any tokenizing, but does create multipletokens
at the same position after "rotating" all of the words, ie for the
inputtext...
        "Dante Gabriel Rossetti"

create the following tokens, all at the same position...

        "Dante Gabriel Rossetti"
        "Gabriel Rossetti, Dante"
        "Rossetti, Dante Gabriel"

...and then keep using TermEnum.

That's a pretty clever solution, actually. However, the one negativeit has is case sensitivity. I could lowercase everything, but thenthe terms the user sees will be in all lowercase and that simplywon't do for my scholarly audience :)

I suppose the general way to do this accurately in a rough fashionwould be to tokenize (StandardAnalyzer would be sufficient) the agentfield, then during TermEnum'ing look up all the Documents for thatterm, grab the "agent" field, and display that - except the fiddlybit about agent being multivalued.

It seems like what I really need is simply a separate index (orrather a partition of the main Solr one) where a Document representsan "agent", and do a PrefixQuery or TermEnum and get all unique agents.

But partitioning the Solr index into various document types at thispoint seems overkill - though it is an area I'd like to explore.

Maybe I need to build some sort of term -> agent cache during warmingthat makes this a no brainer?


Thanks for all the excellent feedback.

And, I was successful in producing an auto-suggest lookup using thenon-tokenized case sensitive terms and performance was quite fine -again this is a Ruby on Rails front-end hitting Solr, with the onlycaching occurring on the Solr side of things currently.


        Erik

Re: solr-suggestion - terms that "start with"...

Reply via email to