I was afraid you would say that.

See http://fora.tv/2009/10/14/ACM_Data_Mining_SIG_Ted_Dunning#fullprogram,
click on the Recommendations section to skip to the good part.

The point is that cross recommendation can let you learn what sorts of
rewrites of this kind are needed.  The idea is that you let your clever
users teach you about what sort of rewrites are necessary so that your less
clever users will benefit.

The engineering effort is higher going in and I wouldn't recommend it if
you have no development budget, but the total effort to get a really high
performing system would be less than trying to engineer all possible
rewrites by hand.

On Tue, Jan 10, 2012 at 10:21 PM, Tanner Postert
<tanner.post...@gmail.com>wrote:

> You mention "that is one way to do it" is there another i'm not seeing?
>
> On Jan 10, 2012, at 4:34 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
>
> > On Tue, Jan 10, 2012 at 5:32 PM, Tanner Postert <
> tanner.post...@gmail.com>wrote:
> >
> >> We've had some issues with people searching for a document with the
> >> search term '200 movies'. The document is actually title 'two hundred
> >> movies'.
> >>
> >> Do we need to add every number to our  synonyms dictionary to
> >> accomplish this?
> >
> >
> > That is one way to deal with this.
> >
> > But it depends on a lot of hand engineering of special cases.  That is
> good
> > to have for the low hanging fruit, but it only takes you so far.  You can
> > also automate the discovery of such cases to a certain degree by
> analyzing
> > query logs.
> >
> >
> >> Is it best done at index or search time?
> >>
> >
> > I would say that opinion is divided on this and in the end, you probably
> > have to do versions of this at both times.  This is especially true if
> you
> > want to include secondary information like inferred query purpose
> > (obviously only available at query time) and inferred document
> > characteristics (best known at indexing time).  Partly the choice about
> > when to do this is driven by which trade-offs you are OK making.  For
> > instance, some people are driven by index size but not query response
> time.
> > They would probably opt for pushing load to the query.  Others may be
> > bound by response time or query throughput.  They may wish to minimize
> > query complexity and size.
>

Reply via email to