On May 17, 2006, at 2:18 PM, Chris Hostetter wrote:
Off the top of my head, i can't think of any cool way Solr cna help
you
with this. My best bet on how to solve the problem in general
would be an
analyzer that doesn't to any tokenizing, but does create multiple
tokens
at the same position after "rotating" all of the words, ie for the
inputtext...
"Dante Gabriel Rossetti"
create the following tokens, all at the same position...
"Dante Gabriel Rossetti"
"Gabriel Rossetti, Dante"
"Rossetti, Dante Gabriel"
...and then keep using TermEnum.
That's a pretty clever solution, actually. However, the one negative
it has is case sensitivity. I could lowercase everything, but then
the terms the user sees will be in all lowercase and that simply
won't do for my scholarly audience :)
I suppose the general way to do this accurately in a rough fashion
would be to tokenize (StandardAnalyzer would be sufficient) the agent
field, then during TermEnum'ing look up all the Documents for that
term, grab the "agent" field, and display that - except the fiddly
bit about agent being multivalued.
It seems like what I really need is simply a separate index (or
rather a partition of the main Solr one) where a Document represents
an "agent", and do a PrefixQuery or TermEnum and get all unique agents.
But partitioning the Solr index into various document types at this
point seems overkill - though it is an area I'd like to explore.
Maybe I need to build some sort of term -> agent cache during warming
that makes this a no brainer?
Thanks for all the excellent feedback.
And, I was successful in producing an auto-suggest lookup using the
non-tokenized case sensitive terms and performance was quite fine -
again this is a Ruby on Rails front-end hitting Solr, with the only
caching occurring on the Solr side of things currently.
Erik