RE: Autocomplete: match words anywhere in the token

Jonathan Rochkind Fri, 24 Sep 2010 08:37:01 -0700

Chantal Ackermann wrote: "I definitely need to have a look at how to use 
facetting in combination
with multivalued fields for autocomplete."


My one kind of crazy idea is to (ab)use the Hilighting Component.  If you make 
a query for auto-suggest terms based on facets, using Chantal's technique, but 
in that query you ask the Highlighting component to highlight matches only from 
the (non-tokenized) field you are faceting on, then this may possibly give you 
the terms from the multi-valued field that actually matched, and therefore 
which ones to actually offer as auto-suggestions. 

The idea would be that this would give you matching terms even though you've 
done various transformation (n-gramming, perhaps tokenization, or stemming, 
other normalization) in the field that you are actually searching upon but the 
highlighting component would magically figure out which full non-transformed 
terms from the other (faceted) field should be considered matches.  This is why 
I'd want to try to do it with the highliting component instead of purely in 
client application code. But I don't really understand how the highlighting 
component works, so I'm not sure if it's up to it or not -- it seems like 
_sometimes_ it can manage to highlight matches despite stemming and other 
transformations in the index, but I don't really understand how it does this, 
how robust it is, or how much it's taking account of the particular analyzers 
on the actual field(s) in question. 

What do you think Chantal?  If you play around with it, do let us know. 

Chantal, you should totally write up a good blog post on your 
auto-complete-using-facets-via-extra-indexed-field technique in general, with 
example solrconfig and application code etc.  The technique you outline is the 
best technique I've seen to get robust auto-complete without having to build an 
entirely seperate index or use the terms component (avoiding both these things 
and using the same base solr index means, among other things,  it lets you 
auto-complete within a index portion limited by fq's etc, which the 'terms' 
based technique does not;  also, since it's a facet based technique, it would 
let you present auto-complete terms in order of popularity in the database--or 
in an fq limited portion of the database-- if you wanted; you get lots of 
advantages sticking to the main index and not using terms or a seperate index). 

Jonathan

RE: Autocomplete: match words anywhere in the token

Reply via email to