Hello, I'm using Solr 9.1, and I'm trying to set up synonyms. I managed to get synonyms to work for single-word synonyms, but not for multiword and hyphenated synonyms.
In the final state, I am planning on having a very extensive synonym file (hundreds, if not thousands of lines) because I want to always find results for all child terms and other synonyms of a given search term. This is why I thought it may make sense to list all synonyms in the index. But getting it to work with query-time synonym expansion would also be great already. For now, I am testing with equivalent synonyms. I am always querying using quotation marks around the multi-word query. What I have tried: 1. I included sow=false in the query as recommended here https://lucidworks.com/post/multi-word-synonyms-solr-adds-query-time-support/ 2. I used the SynonymGraphFilter either only at query time, or at index time, or both -> I got the same number of results when querying single-word synonyms, as expected (e.g. TIGIT, domvanalimab), but querying multi-word synonyms did not find the other synonyms correctly. 3. I made all text fields into a text_field (which uses the KeywordTokenizer) instead of text_general (which uses the StandardTokenizer), in order to prevent splitting up multi-word queries. -> This still did not make multiword-synonyms work. My country-synonyms.txt file looks like this: TIGIT, domvanalimab, COM902, BMS-986207, Anti-TIGIT Antibody immuno-oncology, immunooncology Afghanistan, AF, AFG Albania, AL, ALB And the relevant query fields from my schema.xml look like this, with text_general being the fieldtype of the catchall field <fieldType name="text_field" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.SynonymGraphFilterFactory" synonyms="country-synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.FlattenGraphFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.SynonymGraphFilterFactory" synonyms="country-synonyms.txt" ignoreCase="true" expand="true"/> </analyzer> </fieldType> <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.SnowballPorterFilterFactory" language="English" /> <filter class="solr.SynonymGraphFilterFactory" synonyms="country-synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.FlattenGraphFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.SnowballPorterFilterFactory" language="English" /> <filter class="solr.SynonymGraphFilterFactory" synonyms="country-synonyms.txt" ignoreCase="true" expand="true"/> </analyzer> </fieldType> Any hints would be appreciated! -- PRIVILEGED AND CONFIDENTIAL PLEASE NOTE: The information contained in this message is privileged and confidential, and is intended only for the use of the individual to whom it is addressed and others who have been specifically authorized to receive it. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, or if any problems occur with transmission, please contact the sender and kindly delete any copies of this communication. Thank you.
