It's certainly true that wildcard suppresses the synonym filter since it is not "multi-term aware."

Other than implementing your own version of the synonym filter that was multi-term aware and interpreted wildcards, you may have to do your own preprocessor.

Or, you could do index-time synonyms, so that "bill", "billy", "will", "willy", and "william" were all indexed at the same location. Then the bil* wildcard would match "william" since"bill" is also indexed at the same location.

-- Jack Krupansky

-----Original Message----- From: Roberto Isaac Gonzalez
Sent: Tuesday, January 15, 2013 3:10 PM
To: solr-user@lucene.apache.org
Subject: Synonyms and trailing wildcard

Hi

I'm working on adding nicknames capability to our system. It's basically a
synonym mapping stored in a nicknames.txt file that uses the SynonymFilter
framework.

In one of our search boxes (used for lookups), we automatically append a
trailing wildcard.

There's one use case we're dealing with which is expanding synonyms even if
there's a trailing wildcard.

i.e. Q: Bill*
Expected Results: Bill, Billie, William

Q: Bil*
Expected Results: Bill, so no synonym expansion.

Basically, for synonym expansion, we want to treat the token as if it
didn't contain the trailing wildcard and we also *don't* want to expand the
wildcard before doing the synonym matches.

We tried using the multiterm analysis chain but by definition that expects
one token *in* and one token
*out*(org.apache.solr.schema.TextField.analyzeMultiTerm()) so it
throws an
exception.

I'm looking for options about implementing this scenario and some of the
options I've explored are:

1. Use the multiterm analysis chain and allow Synonym expansion, so one
token in and multiple tokens out.
2. Iterate ourselves and see if the multiterm analysis chain returns more
than one token, if it does, then remove the SynonymFilter from the analysis
chain, something similar to ExtendedDismaxQParser.shouldRemoveStopFilter().
3. ExtendedDismaxQParser.preProcessUserQuery() to OR the non-wildcarded
term.

What do you guys think?


Best Regards,
Roberto Gonzalez

Reply via email to