Re: Deciding whether to stem at query time

Andrew Wagner Tue, 24 Apr 2012 04:22:44 -0700

Ah, this is a really good point. Still seems like it has the downsides of
#2, though, much bigger space requirements and possibly some time lost on
queries.


On Mon, Apr 23, 2012 at 3:35 PM, Walter Underwood <wun...@wunderwood.org>wrote:

> There is a third approach. Create two fields and always query both of
> them, with the exact field given a higher weight. This works great and
> performs well.
>
> It is what we did at Netflix and what I'm doing at Chegg.
>
> wunder
>
> On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote:
>
> > So I just realized the other day that stemming basically happens at index
> > time. If I'm understanding correctly, there's no way to allow a user to
> > specify, at run time, whether to stem particular words or not based on a
> > single index. I think there are two options, but I'd love to hear that
> I'm
> > wrong:
> >
> > 1.) Incrementally build up a white list of words that don't stem very
> well.
> > To pick a random example out of the blue, "light" isn't super closely
> > related to, "lighter", so I might choose not to stem that. If I wanted to
> > do this, I think (if I understand correctly), stemmerOverrideFilter would
> > help me out with this. I'm not a big fan of this approach.
> >
> > 2.) Index all the text in two fields, once with stemming and once
> without.
> > Then build some kind of option into the UI for specifying whether to stem
> > the words or not, and search the appropriate field. Unfortunately, this
> > would roughly double the size of my index, and probably affect query
> times
> > too. Plus, the UI would probably suck.
> >
> > Am I missing an option? Has anyone tried one of these approaches?
> >
> > Thanks!
> > Andrew
>
>
>
>
>
>

Re: Deciding whether to stem at query time

Reply via email to