Hello Annika,

For multiwords synonyms, we have been using
https://github.com/healthonnet/hon-lucene-synonyms jar, that we just
rebuild with solr 9.2.1 (a modification is needed, if you ever need
details).

It overrides edismax query parser and expands multiwords synonyms at query
time.

We didnt want to expand synonyms at index time cause we had this problem:

in the index: mairie
synonym: hotel de ville

and then at query time, with query 'hotel', mairie would match.

With hon-lucene, when user asks for "hotel de ville", we match with mairie,
but "hotel" doesnt match with mairie.

You might have performance issues with hon-lucene if you have hundred of
synonyms. But it's worth testing.

Best regards,
Elisabeth

Le lun. 4 mars 2024 à 17:16, Mikhail Khludnev <[email protected]> a écrit :

> Hello Annika,
> You may use SolrAdmin/Analysys page, debugQuery and explainOther params to
> dig into particular case. It's usually tough.
>  I've found one clue in the ref guide:
>  To get fully correct positional queries when your synonym replacements are
> multiple tokens, you should instead apply synonyms using this filter at
> query time.
> Probably you may start from something simple.
>
> On Mon, Mar 4, 2024 at 5:23 PM Annika Gable
> <[email protected]> wrote:
>
> > Hello,
> >
> > I'm using Solr 9.1, and I'm trying to set up synonyms. I managed to get
> > synonyms to work for single-word synonyms, but not for multiword and
> > hyphenated synonyms.
> >
> > In the final state, I am planning on having a very extensive synonym file
> > (hundreds, if not thousands of lines) because I want to always find
> results
> > for all child terms and other synonyms of a given search term. This is
> why
> > I thought it may make sense to list all synonyms in the index. But
> getting
> > it to work with query-time synonym expansion would also be great already.
> >
> > For now, I am testing with equivalent synonyms. I am always querying
> using
> > quotation marks around the multi-word query.
> >
> > What I have tried:
> > 1. I included sow=false in the query as recommended here
> >
> >
> https://lucidworks.com/post/multi-word-synonyms-solr-adds-query-time-support/
> >
> > 2. I used the SynonymGraphFilter either only at query time, or at index
> > time, or both -> I got the same number of results when querying
> single-word
> > synonyms, as expected (e.g. TIGIT, domvanalimab), but querying multi-word
> > synonyms did not find the other synonyms correctly.
> > 3. I made all text fields into a text_field (which uses the
> > KeywordTokenizer) instead of text_general (which uses the
> > StandardTokenizer), in order to prevent splitting up multi-word queries.
> ->
> > This still did not make multiword-synonyms work.
> >
> >
> > My country-synonyms.txt file looks like this:
> >
> > TIGIT, domvanalimab, COM902, BMS-986207, Anti-TIGIT Antibody
> > immuno-oncology, immunooncology
> > Afghanistan, AF, AFG
> > Albania, AL, ALB
> >
> >
> > And the relevant query fields from my schema.xml look like this, with
> > text_general being the fieldtype of the catchall field
> >
> > <fieldType name="text_field" class="solr.TextField"
> > positionIncrementGap="100">
> >     <analyzer type="index">
> >        <tokenizer class="solr.KeywordTokenizerFactory" />
> >        <filter class="solr.LowerCaseFilterFactory" />
> >        <filter class="solr.SynonymGraphFilterFactory"
> > synonyms="country-synonyms.txt" ignoreCase="true" expand="true"/>
> >        <filter class="solr.FlattenGraphFilterFactory"/>
> >     </analyzer>
> >     <analyzer type="query">
> >        <tokenizer class="solr.KeywordTokenizerFactory" />
> >        <filter class="solr.LowerCaseFilterFactory" />
> >        <filter class="solr.SynonymGraphFilterFactory"
> > synonyms="country-synonyms.txt" ignoreCase="true" expand="true"/>
> >     </analyzer>
> > </fieldType>
> > <fieldType name="text_general" class="solr.TextField"
> > positionIncrementGap="100">
> >     <analyzer type="index">
> >        <tokenizer class="solr.StandardTokenizerFactory" />
> >        <filter class="solr.LowerCaseFilterFactory" />
> >        <filter class="solr.SnowballPorterFilterFactory"
> language="English"
> > />
> >        <filter class="solr.SynonymGraphFilterFactory"
> > synonyms="country-synonyms.txt" ignoreCase="true" expand="true"/>
> >        <filter class="solr.FlattenGraphFilterFactory"/>
> >     </analyzer>
> >     <analyzer type="query">
> >        <tokenizer class="solr.StandardTokenizerFactory" />
> >        <filter class="solr.LowerCaseFilterFactory" />
> >        <filter class="solr.SnowballPorterFilterFactory"
> language="English"
> > />
> >        <filter class="solr.SynonymGraphFilterFactory"
> > synonyms="country-synonyms.txt" ignoreCase="true" expand="true"/>
> >     </analyzer>
> > </fieldType>
> >
> >
> > Any hints would be appreciated!
> >
> > --
> > PRIVILEGED AND CONFIDENTIAL
> > PLEASE NOTE: The information contained in this
> > message is privileged and confidential, and is intended only for the use
> > of
> > the individual to whom it is addressed and others who have been
> > specifically authorized to receive it. If you are not the intended
> > recipient, you are hereby notified that any dissemination, distribution
> or
> > copying of this communication is strictly prohibited. If you have
> received
> > this communication in error, or if any problems occur with transmission,
> > please contact the sender and kindly delete any copies of this
> > communication. Thank you.
> >
> >
> >
> >
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Reply via email to