On Tue, Jan 10, 2012 at 5:32 PM, Tanner Postert <tanner.post...@gmail.com>wrote:
> We've had some issues with people searching for a document with the > search term '200 movies'. The document is actually title 'two hundred > movies'. > > Do we need to add every number to our synonyms dictionary to > accomplish this? That is one way to deal with this. But it depends on a lot of hand engineering of special cases. That is good to have for the low hanging fruit, but it only takes you so far. You can also automate the discovery of such cases to a certain degree by analyzing query logs. > Is it best done at index or search time? > I would say that opinion is divided on this and in the end, you probably have to do versions of this at both times. This is especially true if you want to include secondary information like inferred query purpose (obviously only available at query time) and inferred document characteristics (best known at indexing time). Partly the choice about when to do this is driven by which trade-offs you are OK making. For instance, some people are driven by index size but not query response time. They would probably opt for pushing load to the query. Others may be bound by response time or query throughput. They may wish to minimize query complexity and size.