Hello,

I'm using Solr 9.1, and I'm trying to set up synonyms. I managed to get
synonyms to work for single-word synonyms, but not for multiword and
hyphenated synonyms.

In the final state, I am planning on having a very extensive synonym file
(hundreds, if not thousands of lines) because I want to always find results
for all child terms and other synonyms of a given search term. This is why
I thought it may make sense to list all synonyms in the index. But getting
it to work with query-time synonym expansion would also be great already.

For now, I am testing with equivalent synonyms. I am always querying using
quotation marks around the multi-word query.

What I have tried:
1. I included sow=false in the query as recommended here
https://lucidworks.com/post/multi-word-synonyms-solr-adds-query-time-support/

2. I used the SynonymGraphFilter either only at query time, or at index
time, or both -> I got the same number of results when querying single-word
synonyms, as expected (e.g. TIGIT, domvanalimab), but querying multi-word
synonyms did not find the other synonyms correctly.
3. I made all text fields into a text_field (which uses the
KeywordTokenizer) instead of text_general (which uses the
StandardTokenizer), in order to prevent splitting up multi-word queries. ->
This still did not make multiword-synonyms work.


My country-synonyms.txt file looks like this:

TIGIT, domvanalimab, COM902, BMS-986207, Anti-TIGIT Antibody
immuno-oncology, immunooncology
Afghanistan, AF, AFG
Albania, AL, ALB


And the relevant query fields from my schema.xml look like this, with
text_general being the fieldtype of the catchall field

<fieldType name="text_field" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
       <tokenizer class="solr.KeywordTokenizerFactory" />
       <filter class="solr.LowerCaseFilterFactory" />
       <filter class="solr.SynonymGraphFilterFactory"
synonyms="country-synonyms.txt" ignoreCase="true" expand="true"/>
       <filter class="solr.FlattenGraphFilterFactory"/>
    </analyzer>
    <analyzer type="query">
       <tokenizer class="solr.KeywordTokenizerFactory" />
       <filter class="solr.LowerCaseFilterFactory" />
       <filter class="solr.SynonymGraphFilterFactory"
synonyms="country-synonyms.txt" ignoreCase="true" expand="true"/>
    </analyzer>
</fieldType>
<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
    <analyzer type="index">
       <tokenizer class="solr.StandardTokenizerFactory" />
       <filter class="solr.LowerCaseFilterFactory" />
       <filter class="solr.SnowballPorterFilterFactory" language="English" />
       <filter class="solr.SynonymGraphFilterFactory"
synonyms="country-synonyms.txt" ignoreCase="true" expand="true"/>
       <filter class="solr.FlattenGraphFilterFactory"/>
    </analyzer>
    <analyzer type="query">
       <tokenizer class="solr.StandardTokenizerFactory" />
       <filter class="solr.LowerCaseFilterFactory" />
       <filter class="solr.SnowballPorterFilterFactory" language="English" />
       <filter class="solr.SynonymGraphFilterFactory"
synonyms="country-synonyms.txt" ignoreCase="true" expand="true"/>
    </analyzer>
</fieldType>


Any hints would be appreciated!

-- 
PRIVILEGED AND CONFIDENTIAL
PLEASE NOTE: The information contained in this 
message is privileged and confidential, and is intended only for the use of 
the individual to whom it is addressed and others who have been 
specifically authorized to receive it. If you are not the intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of this communication is strictly prohibited. If you have received 
this communication in error, or if any problems occur with transmission, 
please contact the sender and kindly delete any copies of this 
communication. Thank you.



Reply via email to