On 2010-03-30 15:42, Robert Muir wrote:
On Mon, Mar 29, 2010 at 11:34 PM, Andy<angelf...@yahoo.com>  wrote:

Reading through this thread and SOLR-1316, there seems to be a lot of
different ways to implement auto-complete in Solr. I've seen the mentions
of:

EdgeNGrams
TermsComponent
Faceting
TST
Patricia Tries
RadixTree
DAWG



Another idea is you can use the Automaton support in the lucene flexible
indexing branch: to query the index directly with a DFA that represents
whatever terms you want back.
The idea is that there really isn't much gain in building a separate Pat,
Radix Tree, or DFA to do this when you can efficiently intersect a DFA with
the existing terms dictionary.

I don't really understand what autosuggest needs to do, but if you are doing
things like looking for mispellings you can easily build a DFA that
recognizes terms within some short edit distance with the support thats
there (the LevenshteinAutomata class), to quickly get back candidates.

You can intersect/concatenate/union these DFAs with prefix or suffix DFAs if
you want too, don't really understand what the algorithm should do, but I'm
happy to try to help.


The problem is a bit more complicated. There are two issues:

* simple term-level completion often produces wrong results for multi-term queries (which are usually rewritten as "weak" phrase queries),

* the weights of suggestions should not correspond directly to IDF in the index - much better results can be obtained when they correspond to the frequency of terms/phrases in the query logs ...

TermsComponent and EdgeNGrams, while simple to use, suffer from both issues.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to