By doing synonyms at index time, you cause "apfelsin" to be added to
documents that contain only "orang", so of course documents that previously
only contained "orang" will now match for "apfelsin" or any term query that
matches "apfelsin", such as a wildcard. At query time, Lucene cannot tell
whether your original document contained "apfelsin" or if "apfelsin" was
added when the document was indexed due to an index-time synonym.
Solution: Either disable index time synonyms, or have a parallel field (via
copyField) that does not have the index-time synonyms.
But... perhaps you should clarify what you really intend to happen with
these pseudo-synonyms.
-- Jack Krupansky
-----Original Message-----
From: Johannes Rodenwald
Sent: Wednesday, February 13, 2013 10:25 AM
To: solr-user@lucene.apache.org
Subject: Index-time synonyms and trailing wildcard issue
Hi,
I use Solr 3.6.0 with a synonym filter as the last filter at index time,
using a list of stemmed terms. When i do a wildcard search that matches a
part of an entry on the synonym list, the synonyms found are used by solr to
generate the search results. I am trying to disable that behaviour, but with
no success.
Example:
Stemmed synonyms:
apfelsin, orang
Search term:
apfel*
Matches:
Apfelkuchen, Apfelsaft, Apfelsine... (good, i want these matches)
Orange (bad, i dont want this match)
My questions are:
- Why does the synonym filter react on a wildcard query? For it is not a
multiterm-aware component (see
http://lucene.apache.org/solr/api-3_6_1/org/apache/solr/analysis/MultiTermAwareComponent.html)
- How can i disable this behaviour, so that "Orange" is no longer returned
by the query for "apfel*"?
Regards,
Johannes