Ah, this is a really good point. Still seems like it has the downsides of #2, though, much bigger space requirements and possibly some time lost on queries.
On Mon, Apr 23, 2012 at 3:35 PM, Walter Underwood <wun...@wunderwood.org>wrote: > There is a third approach. Create two fields and always query both of > them, with the exact field given a higher weight. This works great and > performs well. > > It is what we did at Netflix and what I'm doing at Chegg. > > wunder > > On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote: > > > So I just realized the other day that stemming basically happens at index > > time. If I'm understanding correctly, there's no way to allow a user to > > specify, at run time, whether to stem particular words or not based on a > > single index. I think there are two options, but I'd love to hear that > I'm > > wrong: > > > > 1.) Incrementally build up a white list of words that don't stem very > well. > > To pick a random example out of the blue, "light" isn't super closely > > related to, "lighter", so I might choose not to stem that. If I wanted to > > do this, I think (if I understand correctly), stemmerOverrideFilter would > > help me out with this. I'm not a big fan of this approach. > > > > 2.) Index all the text in two fields, once with stemming and once > without. > > Then build some kind of option into the UI for specifying whether to stem > > the words or not, and search the appropriate field. Unfortunately, this > > would roughly double the size of my index, and probably affect query > times > > too. Plus, the UI would probably suck. > > > > Am I missing an option? Has anyone tried one of these approaches? > > > > Thanks! > > Andrew > > > > > >