Right. Stemming is less useful for author fields, you don't need to match "bill 
gate" or "steve job".

Also, if you want to do fuzzy matching, you should only do that on the exact 
fields, not the stemmed fields.

wunder

On Apr 23, 2012, at 3:45 PM, Michael Sokolov wrote:

> Yes, and you might choose to use different options for different fields.  For 
> dictionary searches, where users are searching for specific words, and a high 
> degree of precision is called for, stemming is less helpful, but for full 
> text searches, more so.
> 
> -Mike
> 
> On 4/23/2012 3:35 PM, Walter Underwood wrote:
>> There is a third approach. Create two fields and always query both of them, 
>> with the exact field given a higher weight. This works great and performs 
>> well.
>> 
>> It is what we did at Netflix and what I'm doing at Chegg.
>> 
>> wunder
>> 
>> On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote:
>> 
>>> So I just realized the other day that stemming basically happens at index
>>> time. If I'm understanding correctly, there's no way to allow a user to
>>> specify, at run time, whether to stem particular words or not based on a
>>> single index. I think there are two options, but I'd love to hear that I'm
>>> wrong:
>>> 
>>> 1.) Incrementally build up a white list of words that don't stem very well.
>>> To pick a random example out of the blue, "light" isn't super closely
>>> related to, "lighter", so I might choose not to stem that. If I wanted to
>>> do this, I think (if I understand correctly), stemmerOverrideFilter would
>>> help me out with this. I'm not a big fan of this approach.
>>> 
>>> 2.) Index all the text in two fields, once with stemming and once without.
>>> Then build some kind of option into the UI for specifying whether to stem
>>> the words or not, and search the appropriate field. Unfortunately, this
>>> would roughly double the size of my index, and probably affect query times
>>> too. Plus, the UI would probably suck.
>>> 
>>> Am I missing an option? Has anyone tried one of these approaches?
>>> 
>>> Thanks!
>>> Andrew
>> 
>> 
>> 
>> 
>> 
> 

--
Walter Underwood
wun...@wunderwood.org



Reply via email to