Re: Multiword synonyms and term wildcards/substring matching
Hi Alex Thanks for the reply. We are not using the 'copyField bucket' approach as it is inflexible. Our textual fields are all multivalued dynamic fields, which allows us to craft a list of `pf` (phrase fields) with associated weighting boosts that are meant to be used in the search on a *per-collection* basis. This allows us to have all of the textual fields indexed independently and then simply change the query when we want to include/exclude a field from the search without the need to reindex the entire collection. e/dismax makes this more flexible approach possible. I'll take a look at the ComplexQueryParser and see if it is a good fit. We use a lot of the e/dismax params though, such as `bf` (boost functions), `bq` (boost queries), and 'pf' (phrase fields), to influence the relevance score. FYI: We are using Solr 8.3. On Tue, 2 Mar 2021 at 13:38, Alexandre Rafalovitch wrote: > I admit to not fully understanding the examples, but ComplexQueryParser > looks like something worth at least reviewing: > > > https://lucene.apache.org/solr/guide/8_8/other-parsers.html#complex-phrase-query-parser > > Also I did not see any references to trying to copyField and process same > content in different ways. If copyField is not stored, the overhead is not > as large. > > Regards, > Alex > > > > On Tue., Mar. 2, 2021, 7:08 a.m. Martin Graney, > wrote: > > > Hi All > > > > I have been trying to implement multi word synonyms using `sow=false` > into > > a pre-existing system that applied pre-processing to the phrase to apply > > wildcards around the terms, i.e. `bread stick` => `*bread* *stick*`. > > > > I got the synonyms expansion working perfectly, after discovering the > > `preserveOriginal` filter param, but then I needed to re-implement the > > existing wildcard behaviour. > > I tried using the edge-ngram filter, but found that when searching for > the > > phrase `bread stick` on a field containing the word `breadstick` and > > `q.op=AND` it returns no results, as the content `breadstick` does not > > _start with_ `stick`. The previous wildcard behaviour would return all > > documents that contain the substrings `bread` AND `stick`, which is the > > desired behaviour. > > I tried using the ngram filter, but this does not support the > > `preserveOriginal`, and so loses a lot of relevance for exact matches, > but > > it also results in matches that are far too broad, creating 21 tokens > from > > `breadstick` for `minGramSize=3` and `maxGramSize=5` that in practice > > essentially matches all of the documents. Which means that boosts applied > > to other fields, such as 'in stock', push irrelevant documents to the > top. > > > > Finally, I tried to strip out ngrams entirely and use subquery/LocalParam > > syntax and local params, a solr feature that is not very well documented. > > I created something like `q={!edismax sow=true v=$widlcards} OR {!edismax > > sow=false v=$plain}` to effectively create a union of results, one with > > multi word synonyms support and one with wildcard support. > > But then I had to implement the other edismax params and immediately > > stumbled. > > Each query in production normally has a slew of `bf` and `bq` params, > and I > > cannot see a way to pass these into the nested query using local > variables. > > If I have 3 different `bf` params how can I pass them into the local > param > > subqueries? > > > > Also, as the search in production is across multiple fields I found > passing > > `qf` to both subqueries using dereferencing failed, as the parser saw it > as > > a single field and threw a 'number format exception'. > > i.e. > > q={!edismax sow=true v=$tw tf=$tqf} OR {!edismax sow=false v=$tp tf=$tqf} > > $tw=*bread* *stick* > > $tp=bread stick > > $tqf=title^2 desctiption^0.5 > > > > As you can guess, I have spent quite some time going down this rabbit > hole > > in my attempt to reproduce the existing desired functionality alongside > > multiterm synonyms. > > Is there a way to get multiterm synonyms working with substring matching > > effectively? > > I am sure there is a much simpler way that I am missing than all of my > > attempts so far. > > > > Solr: 8.3 > > > > Thanks > > Martin Graney > > > > -- > > <https://www.linkedin.com/company/sooqr-com/> > > > -- Martin Graney Lead Developer http://sooqr.com <http://www.sooqr.com/> http://twitter.com/sooqrcom Office: +31 (0) 88 766 7700 Mobile: +31 (0) 64 660 8543 -- <https://www.linkedin.com/company/sooqr-com/>
Re: Multiword synonyms and term wildcards/substring matching
I admit to not fully understanding the examples, but ComplexQueryParser looks like something worth at least reviewing: https://lucene.apache.org/solr/guide/8_8/other-parsers.html#complex-phrase-query-parser Also I did not see any references to trying to copyField and process same content in different ways. If copyField is not stored, the overhead is not as large. Regards, Alex On Tue., Mar. 2, 2021, 7:08 a.m. Martin Graney, wrote: > Hi All > > I have been trying to implement multi word synonyms using `sow=false` into > a pre-existing system that applied pre-processing to the phrase to apply > wildcards around the terms, i.e. `bread stick` => `*bread* *stick*`. > > I got the synonyms expansion working perfectly, after discovering the > `preserveOriginal` filter param, but then I needed to re-implement the > existing wildcard behaviour. > I tried using the edge-ngram filter, but found that when searching for the > phrase `bread stick` on a field containing the word `breadstick` and > `q.op=AND` it returns no results, as the content `breadstick` does not > _start with_ `stick`. The previous wildcard behaviour would return all > documents that contain the substrings `bread` AND `stick`, which is the > desired behaviour. > I tried using the ngram filter, but this does not support the > `preserveOriginal`, and so loses a lot of relevance for exact matches, but > it also results in matches that are far too broad, creating 21 tokens from > `breadstick` for `minGramSize=3` and `maxGramSize=5` that in practice > essentially matches all of the documents. Which means that boosts applied > to other fields, such as 'in stock', push irrelevant documents to the top. > > Finally, I tried to strip out ngrams entirely and use subquery/LocalParam > syntax and local params, a solr feature that is not very well documented. > I created something like `q={!edismax sow=true v=$widlcards} OR {!edismax > sow=false v=$plain}` to effectively create a union of results, one with > multi word synonyms support and one with wildcard support. > But then I had to implement the other edismax params and immediately > stumbled. > Each query in production normally has a slew of `bf` and `bq` params, and I > cannot see a way to pass these into the nested query using local variables. > If I have 3 different `bf` params how can I pass them into the local param > subqueries? > > Also, as the search in production is across multiple fields I found passing > `qf` to both subqueries using dereferencing failed, as the parser saw it as > a single field and threw a 'number format exception'. > i.e. > q={!edismax sow=true v=$tw tf=$tqf} OR {!edismax sow=false v=$tp tf=$tqf} > $tw=*bread* *stick* > $tp=bread stick > $tqf=title^2 desctiption^0.5 > > As you can guess, I have spent quite some time going down this rabbit hole > in my attempt to reproduce the existing desired functionality alongside > multiterm synonyms. > Is there a way to get multiterm synonyms working with substring matching > effectively? > I am sure there is a much simpler way that I am missing than all of my > attempts so far. > > Solr: 8.3 > > Thanks > Martin Graney > > -- > <https://www.linkedin.com/company/sooqr-com/> >
Multiword synonyms and term wildcards/substring matching
Hi All I have been trying to implement multi word synonyms using `sow=false` into a pre-existing system that applied pre-processing to the phrase to apply wildcards around the terms, i.e. `bread stick` => `*bread* *stick*`. I got the synonyms expansion working perfectly, after discovering the `preserveOriginal` filter param, but then I needed to re-implement the existing wildcard behaviour. I tried using the edge-ngram filter, but found that when searching for the phrase `bread stick` on a field containing the word `breadstick` and `q.op=AND` it returns no results, as the content `breadstick` does not _start with_ `stick`. The previous wildcard behaviour would return all documents that contain the substrings `bread` AND `stick`, which is the desired behaviour. I tried using the ngram filter, but this does not support the `preserveOriginal`, and so loses a lot of relevance for exact matches, but it also results in matches that are far too broad, creating 21 tokens from `breadstick` for `minGramSize=3` and `maxGramSize=5` that in practice essentially matches all of the documents. Which means that boosts applied to other fields, such as 'in stock', push irrelevant documents to the top. Finally, I tried to strip out ngrams entirely and use subquery/LocalParam syntax and local params, a solr feature that is not very well documented. I created something like `q={!edismax sow=true v=$widlcards} OR {!edismax sow=false v=$plain}` to effectively create a union of results, one with multi word synonyms support and one with wildcard support. But then I had to implement the other edismax params and immediately stumbled. Each query in production normally has a slew of `bf` and `bq` params, and I cannot see a way to pass these into the nested query using local variables. If I have 3 different `bf` params how can I pass them into the local param subqueries? Also, as the search in production is across multiple fields I found passing `qf` to both subqueries using dereferencing failed, as the parser saw it as a single field and threw a 'number format exception'. i.e. q={!edismax sow=true v=$tw tf=$tqf} OR {!edismax sow=false v=$tp tf=$tqf} $tw=*bread* *stick* $tp=bread stick $tqf=title^2 desctiption^0.5 As you can guess, I have spent quite some time going down this rabbit hole in my attempt to reproduce the existing desired functionality alongside multiterm synonyms. Is there a way to get multiterm synonyms working with substring matching effectively? I am sure there is a much simpler way that I am missing than all of my attempts so far. Solr: 8.3 Thanks Martin Graney -- <https://www.linkedin.com/company/sooqr-com/>
Re: SOLR 8.6 Synonyms search and out of context results
Hello, Do you mean that you want searches for "gain" to match documents with "revenue" on them, but do *not* want searches for "revenue" to match documents with "gain" on them? If that's what you mean, how have you defined your synonyms? If you're using the SynonymGraphFilterFactory https://lucene.apache.org/core/8_6_0/analyzers-common/org/apache/lucene/analysis/synonym/SynonymGraphFilterFactory.html then the default parser is https://lucene.apache.org/core/8_6_0/analyzers-common/org/apache/lucene/analysis/synonym/SolrSynonymParser.html which by default will treat comma separated entries as equivalent (bi-directional), while an explicit mapping (=>) only goes in one direction. e.g. given "revenue,gain", revenue is a synonym of gain and gain is a synonym of revenue. However given "gain => revenue", revenue will be a synonym of gain (If you search for gain it will match revenue, but revenue won't turn into gain). When using the synonymGraphFilter at query time I believe (though this may be wrong) that for directional mappings I needed to include the term from the left hand side on the right hand side as well in order for it to still match the original term. So if I've understood your question, I would define that as "gain => gain,revenue". If that doesn't solve it, feel free to share your config and someone might be able to make a suggestion On Fri, 22 Jan 2021 at 14:11, Iram Tariq wrote: > Hi All, > > Using SOLR default Synonyms search I am able to search Synonyms but for > some cases it is giving ambiguous results. > > For example one of Synonyms of "Revenue" is "Gain" > Input Keyword for search: Revenue and Company > Irrelevant Output: Our company doesn't want to gain success through > shortcuts. > Solr version I am using: 8.6.3 > > Any help is very much appreciated here. > > Regards, > > > Iram Tariq | Associate Architect > > NorthBay > > Direct: +92-333-3636333 > > iram.ta...@northbaysolutions.net > > www.northbaysolutions.com >
SOLR 8.6 Synonyms search and out of context results
Hi All, Using SOLR default Synonyms search I am able to search Synonyms but for some cases it is giving ambiguous results. For example one of Synonyms of "Revenue" is "Gain" Input Keyword for search: Revenue and Company Irrelevant Output: Our company doesn't want to gain success through shortcuts. Solr version I am using: 8.6.3 Any help is very much appreciated here. Regards, Iram Tariq | Associate Architect NorthBay Direct: +92-333-3636333 iram.ta...@northbaysolutions.net www.northbaysolutions.com
Re: Multi-word Synonyms not working properly with Edismax
Yes, we tried that and it worked. We removed only for query analyzer and it is working properly now. On Wed, Sep 9, 2020 at 2:24 AM Dominique Bejean wrote: > Hi, > > Can you try to remove the RemoveDuplicatesTokenFilter ? > > Dominique > > Le mar. 8 sept. 2020 à 13:52, Manish Bafna a > écrit : > > > Hi, > > > > We are using the following configuration: > > > > > > > > -- > > > > *Schema: * > > > > > > > positionIncrementGap="100" autoGeneratePhraseQueries="true" > > > > omitNorms="true"> > > > > > > > > > > > > > > > > > > > > > > > > > > > dictionary="../hunspell_dictionary/en_US.dic" > > > > affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" /> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > dictionary="../hunspell_dictionary/en_US.dic" > > > > affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" /> > > > > > > > > > > > > > > > > > > > > *Managed Synonyms:* "abc implement", "bike", "xyz traders", "xyz > > transport" > > > > - > > > > *Query*: bike > > > > *parser Type:* edismax > > > > - > > > > *Parsed query (from debug)* : +DisjunctionMaxQueryfield1:"abc > > > > implement" field1:bike field1:"xyz traders" field1:"xyz trade")) > > > > - > > > > > > > > If you notice, there are 2 multi-word keywords starting with xyz, but > only > > > > 1 of them is getting added to the query. If we change xyz transport to xy > > > > transport, then it works properly. The issue is only when the 2 > multi-word > > > > keywords start with the same word. Though we are using graph synonyms, it > > > > is not working properly. > > > > > > > > Are we doing anything wrong here? > > > > > > > > Thanks, > > > > Manish. > > > > >
Re: Multi-word Synonyms not working properly with Edismax
Hi, Can you try to remove the RemoveDuplicatesTokenFilter ? Dominique Le mar. 8 sept. 2020 à 13:52, Manish Bafna a écrit : > Hi, > > We are using the following configuration: > > > > -- > > *Schema: * > > > positionIncrementGap="100" autoGeneratePhraseQueries="true" > > omitNorms="true"> > > > > > > > > > > > > > dictionary="../hunspell_dictionary/en_US.dic" > > affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" /> > > > > > > > > > > > > > > > > > > dictionary="../hunspell_dictionary/en_US.dic" > > affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" /> > > > > > > > > > > *Managed Synonyms:* "abc implement", "bike", "xyz traders", "xyz > transport" > > - > > *Query*: bike > > *parser Type:* edismax > > - > > *Parsed query (from debug)* : +DisjunctionMaxQueryfield1:"abc > > implement" field1:bike field1:"xyz traders" field1:"xyz trade")) > > - > > > > If you notice, there are 2 multi-word keywords starting with xyz, but only > > 1 of them is getting added to the query. If we change xyz transport to xy > > transport, then it works properly. The issue is only when the 2 multi-word > > keywords start with the same word. Though we are using graph synonyms, it > > is not working properly. > > > > Are we doing anything wrong here? > > > > Thanks, > > Manish. > >
Multi-word Synonyms not working properly with Edismax
Hi, We are using the following configuration: -- *Schema: * *Managed Synonyms:* "abc implement", "bike", "xyz traders", "xyz transport" - *Query*: bike *parser Type:* edismax - *Parsed query (from debug)* : +DisjunctionMaxQueryfield1:"abc implement" field1:bike field1:"xyz traders" field1:"xyz trade")) - If you notice, there are 2 multi-word keywords starting with xyz, but only 1 of them is getting added to the query. If we change xyz transport to xy transport, then it works properly. The issue is only when the 2 multi-word keywords start with the same word. Though we are using graph synonyms, it is not working properly. Are we doing anything wrong here? Thanks, Manish.
Multi-synonyms with sow=false, and Minimum match
Hi! hope everyone is well. I was looking at some old articles and pondered upon https://opensourceconnections.com/blog/2018/02/20/edismax-and-multiterm-synonyms-oddities/ . Do we have a standard manner / robust solution to handle fields with different analyzers (multi-word synonym etc.) clubbed together, with sow=false? Or the recommendation by Doug T. still holds? Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 Medium: https://medium.com/@sarkaramrit2
Re: Tokenizing managed synonyms
Hello, Solr Community: Actually, you can set up a tokenizer for the managed synonyms. But, the configuration is not on the reference guide, and I do not know how to add a Tokenizer via API-call. So, you might need to manually edit a JSON file below the config directory. In the _schema_analysis_synonyms_.json under config directory, you will see the JSON below. { "responseHeader":{ "status":0, "QTime":3}, "synonymMappings":{ "initArgs":{ "ignoreCase":true, "format":"solr"}, "initializedOn":"2014-12-16T22:44:05.33Z", "managedMap":{ "GB": ["GiB", "Gigabyte"], "TV": ["Television"], "happy": ["glad", "joyful"]}}} In order to add a tokenizer, under the "initArgs" key, you need to add the following key-value data. "tokenizerFactory":"solr.Factory" Eventually, you will get the following JSON. { "responseHeader":{ "status":0, "QTime":3}, "synonymMappings":{ " initArgs":{ "ignoreCase":true, "format":"solr", "tokenizerFactory":"solr.Factory" }, "initializedOn":"2014-12-16T22:44:05.33Z", "managedMap":{ "GB": ["GiB", "Gigabyte"], "TV": ["Television"], "happy": ["glad", "joyful"]}}} I would like to add this configuration to Solr reference guide, but I have not created a JIRA issue yet. -- Sincerely, Kaya github: https://github.com/28kayak 2020年7月7日(火) 11:55 Koji Sekiguchi : > I think the question makes sense as SynonymGraphFilterFactory accepts > tokenizerFactory, > he asked the managed version of SynonymGraphFilter could accept it as well. > > > https://lucene.apache.org/solr/guide/8_5/filter-descriptions.html#synonym-graph-filter > > The answer seems to be NO. > > Koji > > > On 2020/07/07 8:18, Erick Erickson wrote: > > This question doesn’t really make sense. You don’t specify tokenizers on > > filters, they’re specified at the _field_ level. > > > > You can certainly define as many field(type)s as you want, each with a > different > > analysis chain and those chains can be made up of whatever you want to > use, and > > there are lots of choices. > > > > If you are asking to do _additional_ tokenization on the output of a > synonym > > filter, no. > > > > Perhaps if you defined the problem you’re trying to solve we could make > some > > suggestions. > > > > Best, > > Erick > > > >> On Jul 6, 2020, at 6:43 PM, Thomas Corthals > wrote: > >> > >> Hi, > >> > >> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph > >> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on > >> some fields. > >> > >> Best, > >> > >> Thomas > > > > > <https://github.com/28kayak>
Re: Tokenizing managed synonyms
I think the question makes sense as SynonymGraphFilterFactory accepts tokenizerFactory, he asked the managed version of SynonymGraphFilter could accept it as well. https://lucene.apache.org/solr/guide/8_5/filter-descriptions.html#synonym-graph-filter The answer seems to be NO. Koji On 2020/07/07 8:18, Erick Erickson wrote: This question doesn’t really make sense. You don’t specify tokenizers on filters, they’re specified at the _field_ level. You can certainly define as many field(type)s as you want, each with a different analysis chain and those chains can be made up of whatever you want to use, and there are lots of choices. If you are asking to do _additional_ tokenization on the output of a synonym filter, no. Perhaps if you defined the problem you’re trying to solve we could make some suggestions. Best, Erick On Jul 6, 2020, at 6:43 PM, Thomas Corthals wrote: Hi, Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on some fields. Best, Thomas
Re: Tokenizing managed synonyms
This question doesn’t really make sense. You don’t specify tokenizers on filters, they’re specified at the _field_ level. You can certainly define as many field(type)s as you want, each with a different analysis chain and those chains can be made up of whatever you want to use, and there are lots of choices. If you are asking to do _additional_ tokenization on the output of a synonym filter, no. Perhaps if you defined the problem you’re trying to solve we could make some suggestions. Best, Erick > On Jul 6, 2020, at 6:43 PM, Thomas Corthals wrote: > > Hi, > > Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph > Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on > some fields. > > Best, > > Thomas
Re: Tokenizing managed synonyms
Please don’t hijack threads, start a new one when you switch topics. > On Jul 6, 2020, at 6:52 PM, Stavros Macrakis wrote: > > How can I search for a term *except *when it's part of certain phrases? > > For example, I might want to find documents mentioning "pepper" where it is > not part of the phrases "chili pepper", "hot pepper", or "pepper sauce". > > It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper" > OR "pepper sauce")] because that excludes all documents which mention > "chili pepper" even if they *also* mention "black pepper" or the unmodified > word "pepper". Maybe some way using synonyms? > > Thanks! > > -s > > On Mon, Jul 6, 2020 at 6:43 PM Thomas Corthals > wrote: > >> Hi, >> >> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph >> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on >> some fields. >> >> Best, >> >> Thomas >>
Re: Tokenizing managed synonyms
How can I search for a term *except *when it's part of certain phrases? For example, I might want to find documents mentioning "pepper" where it is not part of the phrases "chili pepper", "hot pepper", or "pepper sauce". It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper" OR "pepper sauce")] because that excludes all documents which mention "chili pepper" even if they *also* mention "black pepper" or the unmodified word "pepper". Maybe some way using synonyms? Thanks! -s On Mon, Jul 6, 2020 at 6:43 PM Thomas Corthals wrote: > Hi, > > Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph > Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on > some fields. > > Best, > > Thomas >
Tokenizing managed synonyms
Hi, Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on some fields. Best, Thomas
Re: Weird issues when using synonyms and stopwords together
Do not remove stopwords. Stopword removal was a hack invented for 16-bit machines and multi-megabyte disks. That hack is not needed now. tf.idf addresses the same problem as stopwords with a much better algorithm. Removing stopwords is an on/off decision for a guess at common words. tf.idf is a proportional weighting of common words based on the statistics of your documents. Do not remove stopwords. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 20, 2020, at 7:52 AM, Vikas Kumar wrote: > > I have a field title in my solr schema: > > required="true" stored="true" /> > > text_en is defined as follows: > > positionIncrementGap="100" docValues="false" multiValued="false"> > > > words="stopwords_en.txt" /> > > preserveOriginal="true" /> > > > > > synonyms="synonyms_en.txt" ignoreCase="true" expand="true" /> > words="stopwords_en.txt" /> > > > > > > I'm encountering strange behaviour when using multi-word synonyms which > contain stopwords. > > If the stopwords appear in the middle, it works fine. For example, if I > have the following in my synonyms file (where i is a stopword): > > iphone, apple i phone > > And if I query: /select?q=iphone=title=edismax > > The parsed query is: +DisjunctionMaxQuery(+title:appl +title:phone) > title:iphon > > Same for query: /select?q=apple i phone=title=edismax > > But if stopwords appear at the start or end, then behaviour is > unpredictable. > > In most of the cases, the entire synonym is dropped. For example, if I > change my synonyms file to: > > iphone, i phone > > and do the same query again (with iphone), I get: > > +DisjunctionMaxQuery(((title:iphon))) > > I was expecting iphon and phone (as i would be dropped) in my dismax query. > > In some cases, behaviour is even more weird. > > For example, if my synonyms file is: > > between two ferns,netflix comedy,zach galifianakis show,netflix 2019 best > > and I have ferns and best as my stopwords. If I do the following query: > > /select?q=netflix comedy=title=edismax > > I get this: > > +DisjunctionMaxQuery+title:between +title:two +title:galifianaki > +title:show) (+title:netflix +title:2019 +title:comedi > > which is kind of a very weird combinations. > > I'm not able to understand this behaviour and have not found anything > related to this in documentation or internet. Maybe I'm missing something. > Any help/pointers is highly appreciated. > > Solr version: 8.4.1
Weird issues when using synonyms and stopwords together
I have a field title in my solr schema: text_en is defined as follows: I'm encountering strange behaviour when using multi-word synonyms which contain stopwords. If the stopwords appear in the middle, it works fine. For example, if I have the following in my synonyms file (where i is a stopword): iphone, apple i phone And if I query: /select?q=iphone=title=edismax The parsed query is: +DisjunctionMaxQuery(+title:appl +title:phone) title:iphon Same for query: /select?q=apple i phone=title=edismax But if stopwords appear at the start or end, then behaviour is unpredictable. In most of the cases, the entire synonym is dropped. For example, if I change my synonyms file to: iphone, i phone and do the same query again (with iphone), I get: +DisjunctionMaxQuery(((title:iphon))) I was expecting iphon and phone (as i would be dropped) in my dismax query. In some cases, behaviour is even more weird. For example, if my synonyms file is: between two ferns,netflix comedy,zach galifianakis show,netflix 2019 best and I have ferns and best as my stopwords. If I do the following query: /select?q=netflix comedy=title=edismax I get this: +DisjunctionMaxQuery+title:between +title:two +title:galifianaki +title:show) (+title:netflix +title:2019 +title:comedi which is kind of a very weird combinations. I'm not able to understand this behaviour and have not found anything related to this in documentation or internet. Maybe I'm missing something. Any help/pointers is highly appreciated. Solr version: 8.4.1
Re: Re: Re: Re: Handling overlapping synonyms
Hm, I'm not sure what you mean, but I am pretty new to Solr. Apologies! On 1/20/20, 12:01 PM, "fiedzia" wrote: >From my understanding, if you want regional sales manager to be indexed as both director of sales and area manager, you >would have to type: > >Regional sales manager -> director of sales, area manager that works for searching, but because everything is in the same position, searching for "director of sales" highlights whole "regional sales manager". while it should be indexed as: (numbers inidicate token positions 1 2 3 regional sales manager 1 area manager 2 director of sales I guess I'll need to override SynonymGraphFilter to achieve that -- Sent from: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html=DwICAg=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=tDOfGxVxBgFG1YZDv8WICuXs07jdb2IIpoJ0j3Fu7nc=yT0_rHgmEbHTvjxL9Vw9TN3d0TeqHg6avTkuseDWDw8=
Re: Re: Re: Handling overlapping synonyms
>From my understanding, if you want regional sales manager to be indexed as both director of sales and area manager, you >would have to type: > >Regional sales manager -> director of sales, area manager that works for searching, but because everything is in the same position, searching for "director of sales" highlights whole "regional sales manager". while it should be indexed as: (numbers inidicate token positions 1 2 3 regional sales manager 1 area manager 2 director of sales I guess I'll need to override SynonymGraphFilter to achieve that -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Re: Re: Handling overlapping synonyms
From my understanding, if you want regional sales manager to be indexed as both director of sales and area manager, you would have to type: Regional sales manager -> director of sales, area manager I do not believe you can chain synonyms. Re: bigrams/trigrams, I was more interested in you wanting to manually create them by inserting a "_" between the tokens. There is a bigram / trigram capability OOTB with Solr, so is there a reason you're manually coding these into your index instead of just using the OOTB function? On 1/20/20, 6:58 AM, "fiedzia" wrote: > what is the reasoning behind adding the bigrams and trigrams manually like that? Maybe if we knew the end goal, we could figure out a different strategy. Happy that at least the matching is working now! I have large amount of synonyms and keep adding new ones, some of them partially overlap. Its the nature of a language that adding keywords to a phrase creates distinctive meaning. Another example: sales manager -> director of sales regional sales manager -> area manager I'd expect "regional sales manager" to be indexed as both. regional sales manager ^^ -> director of sales ^^ -> area manager so that searching for any of those terms matches and highlights relevant part. However when SynonymGraphFilter finds one synonym it will ignore the other. -- Sent from: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html=DwICAg=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=JUEk2QAGcPS4Pi_y6d3EWDmtYMVjg2Sg-4ZwC-90VqE=tgepeqV5fWmuUgtTc767hv_1czuJnhM9O9LmWVgpDdM=
Re: Re: Handling overlapping synonyms
> what is the reasoning behind adding the bigrams and trigrams manually like that? Maybe if we knew the end goal, we could figure out a different strategy. Happy that at least the matching is working now! I have large amount of synonyms and keep adding new ones, some of them partially overlap. Its the nature of a language that adding keywords to a phrase creates distinctive meaning. Another example: sales manager -> director of sales regional sales manager -> area manager I'd expect "regional sales manager" to be indexed as both. regional sales manager ^^ -> director of sales ^^ -> area manager so that searching for any of those terms matches and highlights relevant part. However when SynonymGraphFilter finds one synonym it will ignore the other. -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Re: Handling overlapping synonyms
Hmm what is the reasoning behind adding the bigrams and trigrams manually like that? Maybe if we knew the end goal, we could figure out a different strategy. Happy that at least the matching is working now! On 1/17/20, 10:28 AM, "fiedzia" wrote: > Doing it the other way (new york city -> new_york_city, new_york) makes more sense, Just checked it, that way does the matching as expected, but highlighting is wrong ("new york: query matches "new york city" as it should, but also highlights all of it) -- Sent from: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html=DwICAg=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=sxUM_HkySPw_KqJdqMGkjWQyUQ6W7K44Nid7p7wcBJ4=rJFkuEpTxkPp6EtyRstEE3PWCY-CSAmtjOFJ9ge67uU=
Re: Handling overlapping synonyms
> Doing it the other way (new york city -> new_york_city, new_york) makes more sense, Just checked it, that way does the matching as expected, but highlighting is wrong ("new york: query matches "new york city" as it should, but also highlights all of it) -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Handling overlapping synonyms
> If you instead write "new york => new_york, new_york_city" it should work I can't do that, as that would turn "new york" into "new york_city", which is not what I want. Doing it the other way (new york city -> new_york_city, new_york) makes more sense, though I expect this to get positions wrong and mess with highlighting. -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Handling overlapping synonyms
If you instead write "new york => new_york, new_york_city" it should work (https://doc.lucidworks.com/fusion/3.1/Collections/Synonyms-Files.html) On 1/17/20, 6:29 AM, "fiedzia" wrote: Having synonyms defined for new york -> new_york new york city -> new_york_city I'd like the phrase new york city to be indexed as both, but SynonymGraphFilter picks only one. Is there a way around that? -- Maciej Dziardziel fied...@gmail.com -- Sent from: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html=DwICAg=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=ogoT0t33fiW87_QMoUn_sWWs_DWHiunR_gq1iXkMR8I=3mtCduryNf-zp79DbcKRtn2hSOWWtgbmYX4idUg1VB0=
Handling overlapping synonyms
Having synonyms defined for new york -> new_york new york city -> new_york_city I'd like the phrase new york city to be indexed as both, but SynonymGraphFilter picks only one. Is there a way around that? -- Maciej Dziardziel fied...@gmail.com -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Query-time synonyms without indexing
Ah, thanks for letting us know. Erick > On Aug 29, 2019, at 9:20 AM, Bjarke Buur Mortensen > wrote: > > The section without type is the one getting picked up for the > index-time chain, so that wasn't my problem. > > It turns out that because of > https://issues.apache.org/jira/browse/LUCENE-8134, I needed to add > a omitTermFreqAndPositions="true" to the declaration. > This has to do with defaults for a string field being different from a text > field, and i Solr 8+ indexing fails because of above ticket. > Adding omitTermFreqAndPositions="true" ensures that index field type and > the schema field type agree on the settings, as I understand it. > > Regards, > Bjarke > > > > Den ons. 28. aug. 2019 kl. 13.26 skrev Erick Erickson < > erickerick...@gmail.com>: > >> Not sure. You have an >> >> section and >> >> >> section. Frankly I’m not sure which one will be used for the index-time >> chain. >> >> Why don’t you just try it? >> change >> >> to >> >> >> reload and go. It’d take you 5 minutes and you’d have your answer. >> >> Best, >> Erick >> >> >>> On Aug 28, 2019, at 1:57 AM, Bjarke Buur Mortensen < >> morten...@eluence.com> wrote: >>> >>> Yes, but isn't that what I am already doing in this case (look at the >>> fieldType in the original mail)? >>> Is there some other way to specify that field type and achieve what I >> want? >>> >>> Thanks, >>> Bjarke >>> >>> On Tue, Aug 27, 2019, 17:32 Erick Erickson >> wrote: >>> >>>> You can have separate index and query time analysis chains, there are >> many >>>> examples in the stock Solr schemas. >>>> >>>> Best, >>>> Erick >>>> >>>>> On Aug 27, 2019, at 8:48 AM, Bjarke Buur Mortensen < >>>> morten...@eluence.com> wrote: >>>>> >>>>> We have a solr file of type "string". >>>>> It turns out that we need to do synonym expansion on query time in >> order >>>> to >>>>> account for some changes over time in the values stored in that field. >>>>> >>>>> So we have tried introducing a custom fieldType that applies the >> synonym >>>>> filter at query time only (see bottom of mail), but that requires us to >>>>> change the field. But now, when we index new documents, Solr complains: >>>>> 400 Bad Request >>>>> Error: 'Exception writing document id someid to the index; possible >>>>> analysis error: cannot change field "auth_country_code" from index >>>>> options=DOCS to inconsistent index >> options=DOCS_AND_FREQS_AND_POSITIONS', >>>>> >>>>> Since we are only making query time changes, I would really like to not >>>>> have to reindex our entire collection. Is that possible somehow? >>>>> >>>>> Thanks, >>>>> Bjarke >>>>> >>>>> >>>>> >>>> sortMissingLast="true" positionIncrementGap="100"> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> synonyms="country-synonyms.txt" ignoreCase="false" expand="true"/> >>>>> >>>>> >>>> >>>> >> >>
Re: Query-time synonyms without indexing
The section without type is the one getting picked up for the index-time chain, so that wasn't my problem. It turns out that because of https://issues.apache.org/jira/browse/LUCENE-8134, I needed to add a omitTermFreqAndPositions="true" to the declaration. This has to do with defaults for a string field being different from a text field, and i Solr 8+ indexing fails because of above ticket. Adding omitTermFreqAndPositions="true" ensures that index field type and the schema field type agree on the settings, as I understand it. Regards, Bjarke Den ons. 28. aug. 2019 kl. 13.26 skrev Erick Erickson < erickerick...@gmail.com>: > Not sure. You have an > > section and > > > section. Frankly I’m not sure which one will be used for the index-time > chain. > > Why don’t you just try it? > change > > to > > > reload and go. It’d take you 5 minutes and you’d have your answer. > > Best, > Erick > > > > On Aug 28, 2019, at 1:57 AM, Bjarke Buur Mortensen < > morten...@eluence.com> wrote: > > > > Yes, but isn't that what I am already doing in this case (look at the > > fieldType in the original mail)? > > Is there some other way to specify that field type and achieve what I > want? > > > > Thanks, > > Bjarke > > > > On Tue, Aug 27, 2019, 17:32 Erick Erickson > wrote: > > > >> You can have separate index and query time analysis chains, there are > many > >> examples in the stock Solr schemas. > >> > >> Best, > >> Erick > >> > >>> On Aug 27, 2019, at 8:48 AM, Bjarke Buur Mortensen < > >> morten...@eluence.com> wrote: > >>> > >>> We have a solr file of type "string". > >>> It turns out that we need to do synonym expansion on query time in > order > >> to > >>> account for some changes over time in the values stored in that field. > >>> > >>> So we have tried introducing a custom fieldType that applies the > synonym > >>> filter at query time only (see bottom of mail), but that requires us to > >>> change the field. But now, when we index new documents, Solr complains: > >>> 400 Bad Request > >>> Error: 'Exception writing document id someid to the index; possible > >>> analysis error: cannot change field "auth_country_code" from index > >>> options=DOCS to inconsistent index > options=DOCS_AND_FREQS_AND_POSITIONS', > >>> > >>> Since we are only making query time changes, I would really like to not > >>> have to reindex our entire collection. Is that possible somehow? > >>> > >>> Thanks, > >>> Bjarke > >>> > >>> > >>> >>> sortMissingLast="true" positionIncrementGap="100"> > >>> > >>> > >>> > >>> > >>> > >>>>>> synonyms="country-synonyms.txt" ignoreCase="false" expand="true"/> > >>> > >>> > >> > >> > >
Re: Query-time synonyms without indexing
Not sure. You have an section and section. Frankly I’m not sure which one will be used for the index-time chain. Why don’t you just try it? change to reload and go. It’d take you 5 minutes and you’d have your answer. Best, Erick > On Aug 28, 2019, at 1:57 AM, Bjarke Buur Mortensen > wrote: > > Yes, but isn't that what I am already doing in this case (look at the > fieldType in the original mail)? > Is there some other way to specify that field type and achieve what I want? > > Thanks, > Bjarke > > On Tue, Aug 27, 2019, 17:32 Erick Erickson wrote: > >> You can have separate index and query time analysis chains, there are many >> examples in the stock Solr schemas. >> >> Best, >> Erick >> >>> On Aug 27, 2019, at 8:48 AM, Bjarke Buur Mortensen < >> morten...@eluence.com> wrote: >>> >>> We have a solr file of type "string". >>> It turns out that we need to do synonym expansion on query time in order >> to >>> account for some changes over time in the values stored in that field. >>> >>> So we have tried introducing a custom fieldType that applies the synonym >>> filter at query time only (see bottom of mail), but that requires us to >>> change the field. But now, when we index new documents, Solr complains: >>> 400 Bad Request >>> Error: 'Exception writing document id someid to the index; possible >>> analysis error: cannot change field "auth_country_code" from index >>> options=DOCS to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS', >>> >>> Since we are only making query time changes, I would really like to not >>> have to reindex our entire collection. Is that possible somehow? >>> >>> Thanks, >>> Bjarke >>> >>> >>> >> sortMissingLast="true" positionIncrementGap="100"> >>> >>> >>> >>> >>> >>> >> synonyms="country-synonyms.txt" ignoreCase="false" expand="true"/> >>> >>> >> >>
Re: Query-time synonyms without indexing
Yes, but isn't that what I am already doing in this case (look at the fieldType in the original mail)? Is there some other way to specify that field type and achieve what I want? Thanks, Bjarke On Tue, Aug 27, 2019, 17:32 Erick Erickson wrote: > You can have separate index and query time analysis chains, there are many > examples in the stock Solr schemas. > > Best, > Erick > > > On Aug 27, 2019, at 8:48 AM, Bjarke Buur Mortensen < > morten...@eluence.com> wrote: > > > > We have a solr file of type "string". > > It turns out that we need to do synonym expansion on query time in order > to > > account for some changes over time in the values stored in that field. > > > > So we have tried introducing a custom fieldType that applies the synonym > > filter at query time only (see bottom of mail), but that requires us to > > change the field. But now, when we index new documents, Solr complains: > > 400 Bad Request > > Error: 'Exception writing document id someid to the index; possible > > analysis error: cannot change field "auth_country_code" from index > > options=DOCS to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS', > > > > Since we are only making query time changes, I would really like to not > > have to reindex our entire collection. Is that possible somehow? > > > > Thanks, > > Bjarke > > > > > > > sortMissingLast="true" positionIncrementGap="100"> > > > > > > > > > > > > > synonyms="country-synonyms.txt" ignoreCase="false" expand="true"/> > > > > > >
Re: Query-time synonyms without indexing
You can have separate index and query time analysis chains, there are many examples in the stock Solr schemas. Best, Erick > On Aug 27, 2019, at 8:48 AM, Bjarke Buur Mortensen > wrote: > > We have a solr file of type "string". > It turns out that we need to do synonym expansion on query time in order to > account for some changes over time in the values stored in that field. > > So we have tried introducing a custom fieldType that applies the synonym > filter at query time only (see bottom of mail), but that requires us to > change the field. But now, when we index new documents, Solr complains: > 400 Bad Request > Error: 'Exception writing document id someid to the index; possible > analysis error: cannot change field "auth_country_code" from index > options=DOCS to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS', > > Since we are only making query time changes, I would really like to not > have to reindex our entire collection. Is that possible somehow? > > Thanks, > Bjarke > > > sortMissingLast="true" positionIncrementGap="100"> > > > > > > synonyms="country-synonyms.txt" ignoreCase="false" expand="true"/> > >
Query-time synonyms without indexing
We have a solr file of type "string". It turns out that we need to do synonym expansion on query time in order to account for some changes over time in the values stored in that field. So we have tried introducing a custom fieldType that applies the synonym filter at query time only (see bottom of mail), but that requires us to change the field. But now, when we index new documents, Solr complains: 400 Bad Request Error: 'Exception writing document id someid to the index; possible analysis error: cannot change field "auth_country_code" from index options=DOCS to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS', Since we are only making query time changes, I would really like to not have to reindex our entire collection. Is that possible somehow? Thanks, Bjarke
Re: Re: Solr edismax parser with multi-word synonyms
Hi Erick, Is there anyway I can get it to match documents containing at least one of the words of the original query? i.e. 'frozen' or 'dinner' or both. (But not partial matches of the synonyms) Thanks,Sunil -Original Message- From: Erick Erickson To: solr-user Sent: Thu, Jul 18, 2019 04:42 AM Subject: Re: Solr edismax parser with multi-word synonyms This is not a phrase query, rather it’s requiring either pair of words to appear in the title. You’ve told it that “frozen dinner” and “microwave foods” are synonyms. So it’s looking for both the words “microwave” and “foods” in the title field, or “frozen” and “dinner” in the title field. You’d see the same thing with single-word synonyms, albeit a little less confusingly. Best, Erick > On Jul 18, 2019, at 1:01 AM, kshitij tyagi > wrote: > > Hi sunil, > > 1. as you have added "microwave food" in synonym as a multiword synonym to > "frozen dinner", edismax parsers finds your synonym in the file and is > considering your query as a Phrase query. > > This is the reason you are seeing parsed query as +(((+title:microwave > +title:food) (+title:frozen +title:dinner))), frozen dinner is considered > as a phrase here. > > If you want partial match on your query then you can add frozen dinner, > microwave food, microwave, food to your synonym file and you will see the > parsed query as: > "+(((+title:microwave +title:food) title:miccrowave title:food > (+title:frozen +title:dinner)))" > Another option is to write your own custom query parser and use it as a > plugin. > > Hope this helps!! > > kshitij > > > On Thu, Jul 18, 2019 at 9:14 AM Sunil Srinivasan wrote: > >> >> I have enabled the SynonymGraphFilter in my field configuration in order >> to support multi-word synonyms (I am using Solr 7.6). Here is my field >> configuration: >> >> >> >> >> >> >> >> > synonyms="synonyms.txt"/> >> >> >> >> >> >> And this is my synonyms.txt file: >> frozen dinner,microwave food >> >> Scenario 1: blue shirt (query with no synonyms) >> >> Here is my first Solr query: >> >> http://localhost:8983/solr/base/search?q=blue+shirt=title=edismax=on >> >> And this is the parsed query I see in the debug output: >> +((title:blue) (title:shirt)) >> >> Scenario 2: frozen dinner (query with synonyms) >> >> Now, here is my second Solr query: >> >> http://localhost:8983/solr/base/search?q=frozen+dinner=title=edismax=on >> >> And this is the parsed query I see in the debug output: >> +(((+title:microwave +title:food) (+title:frozen +title:dinner))) >> >> I am wondering why the first query looks for documents containing at least >> one of the two query tokens, whereas the second query looks for documents >> with both of the query tokens? I would understand if it looked for both the >> tokens of the synonyms (i.e. both microwave and food) to avoid the >> sausagization problem. But I would like to get partial matches on the >> original query at least (i.e. it should also match documents containing >> just the token 'dinner'). >> >> Would any one know why the behavior is different across queries with and >> without synonyms? And how could I work around this if I wanted partial >> matches on queries that also have synonyms? >> >> Ideally, I would like the parsed query in the second case to be: >> +(((+title:microwave +title:food) (title:frozen title:dinner))) >> >> I'd appreciate any help with this. Thanks! >>
Re: Solr edismax parser with multi-word synonyms
This is not a phrase query, rather it’s requiring either pair of words to appear in the title. You’ve told it that “frozen dinner” and “microwave foods” are synonyms. So it’s looking for both the words “microwave” and “foods” in the title field, or “frozen” and “dinner” in the title field. You’d see the same thing with single-word synonyms, albeit a little less confusingly. Best, Erick > On Jul 18, 2019, at 1:01 AM, kshitij tyagi > wrote: > > Hi sunil, > > 1. as you have added "microwave food" in synonym as a multiword synonym to > "frozen dinner", edismax parsers finds your synonym in the file and is > considering your query as a Phrase query. > > This is the reason you are seeing parsed query as +(((+title:microwave > +title:food) (+title:frozen +title:dinner))), frozen dinner is considered > as a phrase here. > > If you want partial match on your query then you can add frozen dinner, > microwave food, microwave, food to your synonym file and you will see the > parsed query as: > "+(((+title:microwave +title:food) title:miccrowave title:food > (+title:frozen +title:dinner)))" > Another option is to write your own custom query parser and use it as a > plugin. > > Hope this helps!! > > kshitij > > > On Thu, Jul 18, 2019 at 9:14 AM Sunil Srinivasan wrote: > >> >> I have enabled the SynonymGraphFilter in my field configuration in order >> to support multi-word synonyms (I am using Solr 7.6). Here is my field >> configuration: >> >> >> >> >> >> >> >> > synonyms="synonyms.txt"/> >> >> >> >> >> >> And this is my synonyms.txt file: >> frozen dinner,microwave food >> >> Scenario 1: blue shirt (query with no synonyms) >> >> Here is my first Solr query: >> >> http://localhost:8983/solr/base/search?q=blue+shirt=title=edismax=on >> >> And this is the parsed query I see in the debug output: >> +((title:blue) (title:shirt)) >> >> Scenario 2: frozen dinner (query with synonyms) >> >> Now, here is my second Solr query: >> >> http://localhost:8983/solr/base/search?q=frozen+dinner=title=edismax=on >> >> And this is the parsed query I see in the debug output: >> +(((+title:microwave +title:food) (+title:frozen +title:dinner))) >> >> I am wondering why the first query looks for documents containing at least >> one of the two query tokens, whereas the second query looks for documents >> with both of the query tokens? I would understand if it looked for both the >> tokens of the synonyms (i.e. both microwave and food) to avoid the >> sausagization problem. But I would like to get partial matches on the >> original query at least (i.e. it should also match documents containing >> just the token 'dinner'). >> >> Would any one know why the behavior is different across queries with and >> without synonyms? And how could I work around this if I wanted partial >> matches on queries that also have synonyms? >> >> Ideally, I would like the parsed query in the second case to be: >> +(((+title:microwave +title:food) (title:frozen title:dinner))) >> >> I'd appreciate any help with this. Thanks! >>
Re: Solr edismax parser with multi-word synonyms
Hi sunil, 1. as you have added "microwave food" in synonym as a multiword synonym to "frozen dinner", edismax parsers finds your synonym in the file and is considering your query as a Phrase query. This is the reason you are seeing parsed query as +(((+title:microwave +title:food) (+title:frozen +title:dinner))), frozen dinner is considered as a phrase here. If you want partial match on your query then you can add frozen dinner, microwave food, microwave, food to your synonym file and you will see the parsed query as: "+(((+title:microwave +title:food) title:miccrowave title:food (+title:frozen +title:dinner)))" Another option is to write your own custom query parser and use it as a plugin. Hope this helps!! kshitij On Thu, Jul 18, 2019 at 9:14 AM Sunil Srinivasan wrote: > > I have enabled the SynonymGraphFilter in my field configuration in order > to support multi-word synonyms (I am using Solr 7.6). Here is my field > configuration: > > > > > > > >synonyms="synonyms.txt"/> > > > > > > And this is my synonyms.txt file: > frozen dinner,microwave food > > Scenario 1: blue shirt (query with no synonyms) > > Here is my first Solr query: > > http://localhost:8983/solr/base/search?q=blue+shirt=title=edismax=on > > And this is the parsed query I see in the debug output: > +((title:blue) (title:shirt)) > > Scenario 2: frozen dinner (query with synonyms) > > Now, here is my second Solr query: > > http://localhost:8983/solr/base/search?q=frozen+dinner=title=edismax=on > > And this is the parsed query I see in the debug output: > +(((+title:microwave +title:food) (+title:frozen +title:dinner))) > > I am wondering why the first query looks for documents containing at least > one of the two query tokens, whereas the second query looks for documents > with both of the query tokens? I would understand if it looked for both the > tokens of the synonyms (i.e. both microwave and food) to avoid the > sausagization problem. But I would like to get partial matches on the > original query at least (i.e. it should also match documents containing > just the token 'dinner'). > > Would any one know why the behavior is different across queries with and > without synonyms? And how could I work around this if I wanted partial > matches on queries that also have synonyms? > > Ideally, I would like the parsed query in the second case to be: > +(((+title:microwave +title:food) (title:frozen title:dinner))) > > I'd appreciate any help with this. Thanks! >
Solr edismax parser with multi-word synonyms
I have enabled the SynonymGraphFilter in my field configuration in order to support multi-word synonyms (I am using Solr 7.6). Here is my field configuration: And this is my synonyms.txt file: frozen dinner,microwave food Scenario 1: blue shirt (query with no synonyms) Here is my first Solr query: http://localhost:8983/solr/base/search?q=blue+shirt=title=edismax=on And this is the parsed query I see in the debug output: +((title:blue) (title:shirt)) Scenario 2: frozen dinner (query with synonyms) Now, here is my second Solr query: http://localhost:8983/solr/base/search?q=frozen+dinner=title=edismax=on And this is the parsed query I see in the debug output: +(((+title:microwave +title:food) (+title:frozen +title:dinner))) I am wondering why the first query looks for documents containing at least one of the two query tokens, whereas the second query looks for documents with both of the query tokens? I would understand if it looked for both the tokens of the synonyms (i.e. both microwave and food) to avoid the sausagization problem. But I would like to get partial matches on the original query at least (i.e. it should also match documents containing just the token 'dinner'). Would any one know why the behavior is different across queries with and without synonyms? And how could I work around this if I wanted partial matches on queries that also have synonyms? Ideally, I would like the parsed query in the second case to be: +(((+title:microwave +title:food) (title:frozen title:dinner))) I'd appreciate any help with this. Thanks!
Re: How to use stopwords, synonyms along with fuzzy match in a SOLR
Ah, I didn’t read thoroughly enough. The problem is stopwords don’t really count for fuzzy searching. By specifying “junk~” you’re not really searching for “junk” or variants. You’re telling Solr “find any term that is a fuzzy match” to “junk”. Under the covers, a search is being made for “jank OR jack OR…) for however many terms are within the edit distance specified for “junk”. So Solr is behaving as expected. Imagine if it worked as you expect and stopwords were removed before applying the fuzzy logic. Then the complaint would be “Hey, I know I have words in my corpus ('jack' in this case) that should match the fuzzy term 'junk~’ but I don’t get any results back”. Notice that no document with straight “junk” in the text will be returned absent other matching fuzzy terms. Best, Erick > On May 9, 2019, at 11:17 AM, bbarani wrote: > > > > > > ignoreCase="true"/> > > > > > ignoreCase="true"/> > >
Re: How to use stopwords, synonyms along with fuzzy match in a SOLR
Thanks for your reply Erick. I create a simple field type as below for testing and added 'junk' to the stopwords but it doesnt seem to honor it when using fuzzzy search Btw, I am using qf along with edismax and pass the value in q (sample query below). /solr/collection1/select?qf=title_autoComplete=false=productName=edismax=junk~=true=100%25=defaultMarketingSequence%20asc=1 Headphone *Jack* Adapter Cable junk~ junk~ (+DisjunctionMaxQuery((title_autoComplete:junk~2)))/no_coord +(title_autoComplete:junk~2) 1.5424817 = sum of: 1.5424817 = weight(title_autoComplete:jack in 190) [SchemaSimilarity], result of: 1.5424817 = score(doc=190,freq=1.0 = termFreq=1.0 ), product of: 0.5 = boost 3.0849633 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: 37.0 = docFreq 819.0 = docCount 1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1) from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.0 = parameter b (norms omitted for field) -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: How to use stopwords, synonyms along with fuzzy match in a SOLR
Well, I’d start by adding debug=true, that’ll show you the parsed query as well as why certain documents scored the way they did. But do note that q=junk~ will search against the default text field (the ”df” parameter in the request handler definition in solrconfig.xml). Is that what you’re expecting? Or, I suppose, it’s searching against the fields defined if you’re using (e)dismax as your query parser. But the debut output (parsed query part) will show what the actual search is. You should also look at the admin/analysis page. For instance, the way you have the field defined at index time, it’ll break on whitespace. But “junk.” won’t be found because your stopword doesn’t contain the period. Plus, your EdgeNGramFilterFactory is pretty strange. A min gram size of 1 means you’re searching for single characters. So what I’d do is back off the definition and build it up bit by bit to see if/when you have this problem. But if stopwords are working correctly at index time, the “junk” will not be _in_ the index, therefore it’ll be impossible to find fuzzy search or not. So you’re making some assumptions that aren’t true, and the analysis process combined with looking at the parsed query should show you quite a lot. Best, Erick > On May 8, 2019, at 4:43 PM, bbarani wrote: > > Hi, > Is there a way to use stopwords and fuzzy match in a SOLR query? > > The below query matches 'jack' too and I added 'junk' to the stopwords (in > query) to avoid returning results but looks like its not honoring the > stopwords when using the fuzzy search. > > solr/collection1/select?app-qf=title_autoComplete=false=*=true=-1=marketingSequence%20asc=productId=true=on=categoryFilter=defaultMarketingSequence%20asc=junk~ > > > > > ignoreCase="true"/> > > > > > synonyms="synonyms.txt"/> > catenateNumbers="0" generateNumberParts="0" generateWordParts="0" > preserveOriginal="1" catenateAll="0" catenateWords="1"/> > minGramSize="1"/> > > > ignoreCase="true"/> > > > > > synonyms="synonyms.txt"/> > catenateNumbers="0" generateNumberParts="0" generateWordParts="0" > preserveOriginal="1" catenateAll="0" catenateWords="1"/> > > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
How to use stopwords, synonyms along with fuzzy match in a SOLR
Hi, Is there a way to use stopwords and fuzzy match in a SOLR query? The below query matches 'jack' too and I added 'junk' to the stopwords (in query) to avoid returning results but looks like its not honoring the stopwords when using the fuzzy search. solr/collection1/select?app-qf=title_autoComplete=false=*=true=-1=marketingSequence%20asc=productId=true=on=categoryFilter=defaultMarketingSequence%20asc=junk~ -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
How to use stopwords, synonyms along with fuzzy match in a SOLR
Hi, Is there a way to use stopwords and fuzzy match in a SOLR query? The below query matches 'jack' too and I added 'junk' to the stopwords (in query) to avoid returning results but looks like its not honoring the stopwords when using the fuzzy search. solr/collection1/select?app-qf=title_autoComplete=false=*=true=-1=marketingSequence%20asc=productId=true=on=categoryFilter=defaultMarketingSequence%20asc=junk~ -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
1969 vs 1960s: not-quite-synonyms in Solr
For a search like "1969 shirt" I would like to return items with either 1969 or 1960s but boost 1969 items higher. For the query "1960s shirt", 1960s and 1960, 1961, ... 1969 should all match equally. Is there a standard technique for this? I'm struggling to do this with eDisMax without adding new fields to the index. Thanks. Gregg
Re: Reload synonyms without reloading the multiple collections
Sorry, I see that it may have been confusing. My webapp calls the reload of all the affected Collections (about a dozen of them) in sequential mode using the Collections API. Ideally I would be able to write some QueryTimeSynonymFilterFactory that would periodically or when told, reload the synonym's file from ZK, which is what the system edits when a user changes some synonyms. I understand that a Collection needs to be reloaded if the synonyms were to be used at indexation time, but this is not my case. The managed API is on the same situation, basically it does what I am doing on my own right now. At the end, there has to be a reload of the affected Collections. Regards, Simón On Sun, Dec 30, 2018 at 5:01 AM Shawn Heisey wrote: > On 12/29/2018 5:55 AM, Simón de Frosterus Pokrzywnicki wrote: > > The problem is that when the user changes the synonyms, it automatically > > triggers a sequential reload of all the Collections. > > What exactly is being done when you say "the user changes the > synonyms"? Just uploading a new synonyms definition file to zookeeper > would *NOT* result in a reload of *ANY* collection. As far as I am > aware, collection reloads only happen when they are explicitly > requested. Usage of the managed APIs to change aspects of the schema > could cause a reload, but it's only going to happen on the collection > where the API is used, not all collections. > > Basically, I cannot imagine any situation that would cause a reload of > all collections, other than explicitly asking Solr to do those reloads. > > Thanks, > Shawn > >
Re: Reload synonyms without reloading the multiple collections
On 12/29/2018 5:55 AM, Simón de Frosterus Pokrzywnicki wrote: The problem is that when the user changes the synonyms, it automatically triggers a sequential reload of all the Collections. What exactly is being done when you say "the user changes the synonyms"? Just uploading a new synonyms definition file to zookeeper would *NOT* result in a reload of *ANY* collection. As far as I am aware, collection reloads only happen when they are explicitly requested. Usage of the managed APIs to change aspects of the schema could cause a reload, but it's only going to happen on the collection where the API is used, not all collections. Basically, I cannot imagine any situation that would cause a reload of all collections, other than explicitly asking Solr to do those reloads. Thanks, Shawn
Reload synonyms without reloading the multiple collections
Hello, I have a solrcloud setup with multiple Collections based on the same configset. One of the features I have is that the user can define their own synonyms in order to improve their search experience which has worked fine until recently. Lately the platform has grown and the user has several dozen Collections, must of them with 200k or more documents of non-trivial size. The problem is that when the user changes the synonyms, it automatically triggers a sequential reload of all the Collections. This is now always causing problems, to a point where the platform becomes unstable and may need a restart of Solr, which means we have to access the platform and manually stabilize it. The synonyms are only used at query time, so there is no need to reindex anything and it seems like overkill to reload the Collections to change the synonyms. I have tried creating my own CustomSynonymGraphFilter and have it call the loadSynonyms() method as needed but this causes some weird behavior where queries sometimes have the newly added synonyms working fine but sometimes not. I get the impression that there may be like N "threads" handling the queries but I only change the SynonymMap for one of them, so when the query hits the right "thread" it works, but in most cases it does not. My custom fieldType looks like this: I would like to know if there is some other class I can redefine to make sure the new SynonymMap is used in all cases. Thanks, Simón PS: I have upgraded to Solr 7.6.
Re: MoreLikeThis & Synonyms
On Wed, Dec 26, 2018 at 09:09:02PM -0800, Erick Erickson wrote: > bq. However multiword synonyms are only compatible with queryTime synonyms > expansion. > > Why do you say this? What version of Solr? Query-time mult-word > synonyms were _added_, but AFAIK the capability of multi-word synonyms > was not taken away. >From this blogpost [1] I deduced multi-word synonyms are only compatible with query time expansion. > Or are you saying that MLT doesn't play nice at all with multi-word > synonyms? >From my tests, MLT does not expand the query with synonyms. So it is not possible to use query time synonyms nor mutli-word. Only index time is possible with the limitations it has [1] > What version of Solr are you using? I am running solr 7.6. [1] https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/ -- nicolas
Re: MoreLikeThis & Synonyms
bq. However multiword synonyms are only compatible with queryTime synonyms expansion. Why do you say this? What version of Solr? Query-time mult-word synonyms were _added_, but AFAIK the capability of multi-word synonyms was not taken away. Or are you saying that MLT doesn't play nice at all with multi-word synonyms? What version of Solr are you using? Best, Erick On Wed, Dec 26, 2018 at 5:25 AM Nicolas Paris wrote: > > Hi > > It turns out that MoreLikeThis handler does not use queryTime synonyms > expansion. > > It is only compatible with indexTime synonyms. > > However multiword synonyms are only compatible with queryTime synonyms > expansion. > > For this reason this does not allow the use of multiword synonyms within > together with the MoreLikeThis handler. > > Is there any reason for the MoreLikeThis feature not compatible with > Multiword Synonyms ? > > Thanks > -- > nicolas
MoreLikeThis & Synonyms
Hi It turns out that MoreLikeThis handler does not use queryTime synonyms expansion. It is only compatible with indexTime synonyms. However multiword synonyms are only compatible with queryTime synonyms expansion. For this reason this does not allow the use of multiword synonyms within together with the MoreLikeThis handler. Is there any reason for the MoreLikeThis feature not compatible with Multiword Synonyms ? Thanks -- nicolas
RE: KeywordRepeat, stemming, (single term) synonyms and minimum should match (edismax)
Hello, Sorry for trying this once more. Is there anyone around who can help me, and perhaps others, on this subject and the linked Jira ticket and failing test? I could really need some help from someone who is really familiar with edismax code and the underlying QueryBuilder parts that are used, and then get replaced by Solr code. Many thanks, Markus -Original message- > From:Markus Jelsma > Sent: Thursday 22nd November 2018 15:39 > To: solr-user@lucene.apache.org; solr-user > Subject: RE: KeywordRepeat, stemming, (single term) synonyms and minimum > should match (edismax) > > Hello, > > I have opened a SOLR-13009 describing the problem. The attached patch > contains a unit test proving the problem, i.e. the test fails. Any help would > be greatly appreciated. > > Many thanks, > Markus > > https://issues.apache.org/jira/browse/SOLR-13009 > > > > -Original message- > > From:Markus Jelsma > > Sent: Sunday 18th November 2018 23:21 > > To: solr-user@lucene.apache.org; solr-user > > Subject: RE: KeywordRepeat, stemming, (single term) synonyms and minimum > > should match (edismax) > > > > Hello, > > > > Apologies for bothering you all again, but i really need some help in this > > matter. How can we resolve this issue? Are we dealing with a bug here (then > > i'll open a ticket), am i doing something wrong? > > > > Is here anyone who had the same issue or understand the problem? > > > > Many thanks, > > Markus > > > > > > > > -Original message- > > > From:Markus Jelsma > > > Sent: Tuesday 13th November 2018 9:52 > > > To: solr-user > > > Subject: KeywordRepeat, stemming, (single term) synonyms and minimum > > > should match (edismax) > > > > > > Hello, apologies for this long winded e-mail. > > > > > > Our fields have KeywordRepeat and language specific filters such as a > > > stemmer, the final filter at query-time is SynonymGraph. We do not use > > > RemoveDuplicatesFilter for those of you wondering why when you see the > > > parsed queries below, this is due to [1]. > > > > > > We use a custom QParser extending edismax and also extend > > > ExtendedSolrQueryParser, so we are able to override newFieldQuery in case > > > we have to. The problem also directly applies to Solr's vanilla edismax. > > > The file synonyms.txt contains the stemmed versions of the original terms. > > > > > > Consider this example synonym set [bier,brouw] where bier means beer and > > > brouw is the stemmed version of brouwsel (brewage, concoction), and > > > consider these parameters on /select: > > > qf=content_nl=edismax=2<-1 5<-2 6<90%25. > > > > > > The queries q=bier and q=brouw both parse to the following query and give > > > the desired results (notice the missing RemoveDuplicates here): > > > +(((Synonym(content_nl:bier content_nl:brouw) Synonym(content_nl:bier > > > content_nl:brouw))~2)) > > > > > > However, for q=brouwsel something (partially) unexpected happens: > > > +(((content_nl:brouwsel Synonym(content_nl:bier content_nl:brouw))~2)) > > > > > > This results in a BooleanQuery where, due to mm=2, both clauses need to > > > match, giving very few matches. Removing KeywordRepeat or setting mm=1 of > > > course fixes the problem, but that is not what we want. > > > > > > What is also unexpected, and may be related to the problem, is that when > > > checking the analzer output via the GUI, we see the position incrementing > > > when KeywordRepeat and SynonymGraph are combined. When these filters are > > > not combined, the positions are always 1, as expected. When combined we > > > get this for 'brouw': > > > term: bier brouw bier brouw > > > pos: 1 1 2 2 > > > > > > or for 'brouwsel': > > > term: brouwsel bier brouw > > > pos: 1 2 2 > > > > > > ExtendedSolrQueryParser, and everything underneath, is a complicated > > > piece of code. In the end it extends Lucene's QueryBuilder, but not > > > always relying on its results, it seems. Edismax for example 'resets' > > > minShouldMatch in SolrPluginUtils.setMinShouldMatch(), so this is a > > > complicated web of code and i am a bit too deep in this unfamiliar area, > > > and i am in need of help here. > > > > > > So, my question is, how to solve this problem? Or how to approach it? > > > What is the actual problem? How can i get the same stable results for > > > both queries? Does the odd positon increment have anything to do with it > > > (it seems Lucene's QueryBuilder does something with it). What do i need > > > to do? > > > > > > Many thanks, > > > Markus > > > > > > ps. this is on Solr 7.2.1 and 7.5.0. > > > > > > [1] > > > http://lucene.472066.n3.nabble.com/Multiple-languages-boosting-and-stemming-and-KeywordRepeat-td4389086.html > > > > > >
Re: Is reload necessary for updates to files referenced in schema, like synonyms, protwords, etc?
On 11/28/2018 6:37 AM, Vincenzo D'Amore wrote: Very likely I'm late to this party :) not sure with solr standalone, but with solrcloud (7.3.1) you have to reload the core every time synonyms referenced by a schema are changed. I have a 7.5.0 download on my workstation, so I fired that up, created a core, and tried it out. I did learn that a reload is required when changing files referenced by analysis components in the schema. That's what I had thought was probably the case, now I know for sure. Thanks, Shawn
Re: Is reload necessary for updates to files referenced in schema, like synonyms, protwords, etc?
Very likely I'm late to this party :) not sure with solr standalone, but with solrcloud (7.3.1) you have to reload the core every time synonyms referenced by a schema are changed. On Mon, Nov 26, 2018 at 8:51 PM Walter Underwood wrote: > Should be easy to check with the analysis UI. Add a synonym and see if it > is used. > > I seem to remember some work on reloading synonyms on the fly without a > core reload. These seem related... > > https://issues.apache.org/jira/browse/SOLR-5200 > https://issues.apache.org/jira/browse/SOLR-5234 > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Nov 26, 2018, at 11:43 AM, Shawn Heisey wrote: > > > > I know that changes to the schema require a reload. But do changes to > files referenced by a schema also require a reload? So if for instance I > were to change the contents of a synonym file, would I need to reload the > core before Solr would use the new file? Synonyms in this case are at > query time, but other files like protwords are used at index time. > > > > I *THINK* that a reload is required, but I can't be sure without > checking the code, and it would probably take me more than a couple of > hours to unravel the code enough to answer the question myself. > > > > It is not SolrCloud, so there's no ZK to worry about. > > > > Thanks, > > Shawn > > > > -- Vincenzo D'Amore
Re: Is reload necessary for updates to files referenced in schema, like synonyms, protwords, etc?
Should be easy to check with the analysis UI. Add a synonym and see if it is used. I seem to remember some work on reloading synonyms on the fly without a core reload. These seem related... https://issues.apache.org/jira/browse/SOLR-5200 https://issues.apache.org/jira/browse/SOLR-5234 wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 26, 2018, at 11:43 AM, Shawn Heisey wrote: > > I know that changes to the schema require a reload. But do changes to files > referenced by a schema also require a reload? So if for instance I were to > change the contents of a synonym file, would I need to reload the core before > Solr would use the new file? Synonyms in this case are at query time, but > other files like protwords are used at index time. > > I *THINK* that a reload is required, but I can't be sure without checking the > code, and it would probably take me more than a couple of hours to unravel > the code enough to answer the question myself. > > It is not SolrCloud, so there's no ZK to worry about. > > Thanks, > Shawn >
Is reload necessary for updates to files referenced in schema, like synonyms, protwords, etc?
I know that changes to the schema require a reload. But do changes to files referenced by a schema also require a reload? So if for instance I were to change the contents of a synonym file, would I need to reload the core before Solr would use the new file? Synonyms in this case are at query time, but other files like protwords are used at index time. I *THINK* that a reload is required, but I can't be sure without checking the code, and it would probably take me more than a couple of hours to unravel the code enough to answer the question myself. It is not SolrCloud, so there's no ZK to worry about. Thanks, Shawn
RE: KeywordRepeat, stemming, (single term) synonyms and minimum should match (edismax)
Hello, I have opened a SOLR-13009 describing the problem. The attached patch contains a unit test proving the problem, i.e. the test fails. Any help would be greatly appreciated. Many thanks, Markus https://issues.apache.org/jira/browse/SOLR-13009 -Original message- > From:Markus Jelsma > Sent: Sunday 18th November 2018 23:21 > To: solr-user@lucene.apache.org; solr-user > Subject: RE: KeywordRepeat, stemming, (single term) synonyms and minimum > should match (edismax) > > Hello, > > Apologies for bothering you all again, but i really need some help in this > matter. How can we resolve this issue? Are we dealing with a bug here (then > i'll open a ticket), am i doing something wrong? > > Is here anyone who had the same issue or understand the problem? > > Many thanks, > Markus > > > > -Original message- > > From:Markus Jelsma > > Sent: Tuesday 13th November 2018 9:52 > > To: solr-user > > Subject: KeywordRepeat, stemming, (single term) synonyms and minimum should > > match (edismax) > > > > Hello, apologies for this long winded e-mail. > > > > Our fields have KeywordRepeat and language specific filters such as a > > stemmer, the final filter at query-time is SynonymGraph. We do not use > > RemoveDuplicatesFilter for those of you wondering why when you see the > > parsed queries below, this is due to [1]. > > > > We use a custom QParser extending edismax and also extend > > ExtendedSolrQueryParser, so we are able to override newFieldQuery in case > > we have to. The problem also directly applies to Solr's vanilla edismax. > > The file synonyms.txt contains the stemmed versions of the original terms. > > > > Consider this example synonym set [bier,brouw] where bier means beer and > > brouw is the stemmed version of brouwsel (brewage, concoction), and > > consider these parameters on /select: qf=content_nl=edismax=2<-1 > > 5<-2 6<90%25. > > > > The queries q=bier and q=brouw both parse to the following query and give > > the desired results (notice the missing RemoveDuplicates here): > > +(((Synonym(content_nl:bier content_nl:brouw) Synonym(content_nl:bier > > content_nl:brouw))~2)) > > > > However, for q=brouwsel something (partially) unexpected happens: > > +(((content_nl:brouwsel Synonym(content_nl:bier content_nl:brouw))~2)) > > > > This results in a BooleanQuery where, due to mm=2, both clauses need to > > match, giving very few matches. Removing KeywordRepeat or setting mm=1 of > > course fixes the problem, but that is not what we want. > > > > What is also unexpected, and may be related to the problem, is that when > > checking the analzer output via the GUI, we see the position incrementing > > when KeywordRepeat and SynonymGraph are combined. When these filters are > > not combined, the positions are always 1, as expected. When combined we get > > this for 'brouw': > > term: bier brouw bier brouw > > pos: 1 1 2 2 > > > > or for 'brouwsel': > > term: brouwsel bier brouw > > pos: 1 2 2 > > > > ExtendedSolrQueryParser, and everything underneath, is a complicated piece > > of code. In the end it extends Lucene's QueryBuilder, but not always > > relying on its results, it seems. Edismax for example 'resets' > > minShouldMatch in SolrPluginUtils.setMinShouldMatch(), so this is a > > complicated web of code and i am a bit too deep in this unfamiliar area, > > and i am in need of help here. > > > > So, my question is, how to solve this problem? Or how to approach it? What > > is the actual problem? How can i get the same stable results for both > > queries? Does the odd positon increment have anything to do with it (it > > seems Lucene's QueryBuilder does something with it). What do i need to do? > > > > Many thanks, > > Markus > > > > ps. this is on Solr 7.2.1 and 7.5.0. > > > > [1] > > http://lucene.472066.n3.nabble.com/Multiple-languages-boosting-and-stemming-and-KeywordRepeat-td4389086.html > > >
RE: KeywordRepeat, stemming, (single term) synonyms and minimum should match (edismax)
Hello, Apologies for bothering you all again, but i really need some help in this matter. How can we resolve this issue? Are we dealing with a bug here (then i'll open a ticket), am i doing something wrong? Is here anyone who had the same issue or understand the problem? Many thanks, Markus -Original message- > From:Markus Jelsma > Sent: Tuesday 13th November 2018 9:52 > To: solr-user > Subject: KeywordRepeat, stemming, (single term) synonyms and minimum should > match (edismax) > > Hello, apologies for this long winded e-mail. > > Our fields have KeywordRepeat and language specific filters such as a > stemmer, the final filter at query-time is SynonymGraph. We do not use > RemoveDuplicatesFilter for those of you wondering why when you see the parsed > queries below, this is due to [1]. > > We use a custom QParser extending edismax and also extend > ExtendedSolrQueryParser, so we are able to override newFieldQuery in case we > have to. The problem also directly applies to Solr's vanilla edismax. The > file synonyms.txt contains the stemmed versions of the original terms. > > Consider this example synonym set [bier,brouw] where bier means beer and > brouw is the stemmed version of brouwsel (brewage, concoction), and consider > these parameters on /select: qf=content_nl=edismax=2<-1 5<-2 > 6<90%25. > > The queries q=bier and q=brouw both parse to the following query and give the > desired results (notice the missing RemoveDuplicates here): > +(((Synonym(content_nl:bier content_nl:brouw) Synonym(content_nl:bier > content_nl:brouw))~2)) > > However, for q=brouwsel something (partially) unexpected happens: > +(((content_nl:brouwsel Synonym(content_nl:bier content_nl:brouw))~2)) > > This results in a BooleanQuery where, due to mm=2, both clauses need to > match, giving very few matches. Removing KeywordRepeat or setting mm=1 of > course fixes the problem, but that is not what we want. > > What is also unexpected, and may be related to the problem, is that when > checking the analzer output via the GUI, we see the position incrementing > when KeywordRepeat and SynonymGraph are combined. When these filters are not > combined, the positions are always 1, as expected. When combined we get this > for 'brouw': > term: bier brouw bier brouw > pos: 1 1 2 2 > > or for 'brouwsel': > term: brouwsel bier brouw > pos: 1 2 2 > > ExtendedSolrQueryParser, and everything underneath, is a complicated piece of > code. In the end it extends Lucene's QueryBuilder, but not always relying on > its results, it seems. Edismax for example 'resets' minShouldMatch in > SolrPluginUtils.setMinShouldMatch(), so this is a complicated web of code and > i am a bit too deep in this unfamiliar area, and i am in need of help here. > > So, my question is, how to solve this problem? Or how to approach it? What > is the actual problem? How can i get the same stable results for both > queries? Does the odd positon increment have anything to do with it (it seems > Lucene's QueryBuilder does something with it). What do i need to do? > > Many thanks, > Markus > > ps. this is on Solr 7.2.1 and 7.5.0. > > [1] > http://lucene.472066.n3.nabble.com/Multiple-languages-boosting-and-stemming-and-KeywordRepeat-td4389086.html >
KeywordRepeat, stemming, (single term) synonyms and minimum should match (edismax)
Hello, apologies for this long winded e-mail. Our fields have KeywordRepeat and language specific filters such as a stemmer, the final filter at query-time is SynonymGraph. We do not use RemoveDuplicatesFilter for those of you wondering why when you see the parsed queries below, this is due to [1]. We use a custom QParser extending edismax and also extend ExtendedSolrQueryParser, so we are able to override newFieldQuery in case we have to. The problem also directly applies to Solr's vanilla edismax. The file synonyms.txt contains the stemmed versions of the original terms. Consider this example synonym set [bier,brouw] where bier means beer and brouw is the stemmed version of brouwsel (brewage, concoction), and consider these parameters on /select: qf=content_nl=edismax=2<-1 5<-2 6<90%25. The queries q=bier and q=brouw both parse to the following query and give the desired results (notice the missing RemoveDuplicates here): +(((Synonym(content_nl:bier content_nl:brouw) Synonym(content_nl:bier content_nl:brouw))~2)) However, for q=brouwsel something (partially) unexpected happens: +(((content_nl:brouwsel Synonym(content_nl:bier content_nl:brouw))~2)) This results in a BooleanQuery where, due to mm=2, both clauses need to match, giving very few matches. Removing KeywordRepeat or setting mm=1 of course fixes the problem, but that is not what we want. What is also unexpected, and may be related to the problem, is that when checking the analzer output via the GUI, we see the position incrementing when KeywordRepeat and SynonymGraph are combined. When these filters are not combined, the positions are always 1, as expected. When combined we get this for 'brouw': term: bier brouw bier brouw pos: 1 1 2 2 or for 'brouwsel': term: brouwsel bier brouw pos: 1 2 2 ExtendedSolrQueryParser, and everything underneath, is a complicated piece of code. In the end it extends Lucene's QueryBuilder, but not always relying on its results, it seems. Edismax for example 'resets' minShouldMatch in SolrPluginUtils.setMinShouldMatch(), so this is a complicated web of code and i am a bit too deep in this unfamiliar area, and i am in need of help here. So, my question is, how to solve this problem? Or how to approach it? What is the actual problem? How can i get the same stable results for both queries? Does the odd positon increment have anything to do with it (it seems Lucene's QueryBuilder does something with it). What do i need to do? Many thanks, Markus ps. this is on Solr 7.2.1 and 7.5.0. [1] http://lucene.472066.n3.nabble.com/Multiple-languages-boosting-and-stemming-and-KeywordRepeat-td4389086.html
Re: Synonyms relationships
Synonyms in Solr are really a kind of "programmers" tool, useful for mapping terms to other terms. This need not correspond to linguistic notions of a synonym or hypernomy/hyponomy. That being said, there's probably half a dozen approaches for doing these kinds of taxonomical relationships in Solr on top of synonyms Here's some resources / techniques we use at OpenSource Connections for clients https://www.youtube.com/watch?v=90F30PS-884 https://opensourceconnections.com/blog/2017/11/21/solr-synonyms-mea-culpa/ https://opensourceconnections.com/blog/2016/12/23/elasticsearch-synonyms-patterns-taxonomies/ (last one is ES, but same ideas apply...) Best, -Doug On Wed, Oct 31, 2018 at 6:20 AM Nicolas Paris wrote: > Hi > > Does SolR provide a way to describe synonyms relationships such > "equivalent to" ,"narrower thant", "broader than" ? > > It turns out both postgres and oracle do, but I can't find any related > information in the documentation. > > This is useful to allow generalizing the terms of the research or not. > > Thanks , > > > -- > nicolas > -- CTO, OpenSource Connections Author, Relevant Search http://o19s.com/doug
Synonyms relationships
Hi Does SolR provide a way to describe synonyms relationships such "equivalent to" ,"narrower thant", "broader than" ? It turns out both postgres and oracle do, but I can't find any related information in the documentation. This is useful to allow generalizing the terms of the research or not. Thanks , -- nicolas
Re: synonyms for Solr Cloud -
You can have the synonyms text file in the same config folder as the rest of your files like solrconfig.xml that you will push to Solr Cloud. When you push the config file to Solr Cloud, the synonyms text file will be push in to Solr Cloud together. In your solrconfig.xml, you will need to add the SynonymFilterFactory as per normal. Regards, Edwin On Wed, 19 Sep 2018 at 11:58, Rathor, Piyush (US - Philadelphia) < prat...@deloitte.com> wrote: > Hi All, > > > > How can we add a synonyms text file to solr cloud. I have a text file with > comma separated synonyms. > > > > > > *Thanks & Regards* > > *Piyush Rathor* > > Consultant > > Deloitte Digital (Salesforce.com / Force.com) > > Deloitte Consulting Pvt. Ltd. > > *Office*: +1 (615) 209 4980 > > *Mobile *: +1 (302) 397 1491 > > prat...@deloitte.com | www.deloitte.com > > [image: cid:image001.png@01D012F3.6C4D42E0] > > Please consider the environment before printing. > > > > This message (including any attachments) contains confidential information > intended for a specific individual and purpose, and is protected by law. If > you are not the intended recipient, you should delete this message and any > disclosure, copying, or distribution of this message, or the taking of any > action based on it, by you is strictly prohibited. > > v.E.1 >
synonyms for Solr Cloud -
Hi All, How can we add a synonyms text file to solr cloud. I have a text file with comma separated synonyms. Thanks & Regards Piyush Rathor Consultant Deloitte Digital (Salesforce.com / Force.com) Deloitte Consulting Pvt. Ltd. Office: +1 (615) 209 4980 Mobile : +1 (302) 397 1491 prat...@deloitte.com<mailto:prat...@deloitte.com> | www.deloitte.com<http://www.deloitte.com/> [cid:image001.png@01D012F3.6C4D42E0] Please consider the environment before printing. This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message and any disclosure, copying, or distribution of this message, or the taking of any action based on it, by you is strictly prohibited. v.E.1
Re: Multi-word Synonyms - how does sow parameter work?
Thanks Andrea for the tip. I wasn't aware of the autoGeneratePhraseQueries option for text fields, will definitely keep it in mind. But I question if this is related to the fix on the query parser which essentially introduces sow parameter and if false (looks like that is the default in Solr 7), multiwords should be sent as a 'single input' (see https://issues.apache.org/jira/browse/LUCENE-2605). That defect doesn't make mention of autoGeneratePhraseQueries. I think this is where my confusion lies: as a non-developer unfortunately I'm not clear what 'multiwords will be sent as a single input' means, should it mean that it is treated as a phrase query? Use AND? So far as mentioned I only observe that it is just OR clauses, which is no different than before the fix. Thanks again! On Thu, Aug 16, 2018 at 12:39 AM, Andrea Gazzarini wrote: > Hi Roy, I think you miss the autoGeneratePhraseQueries=true in the field > type definition. > I was on a slightly different use case when I met your same issue (I was > using synonyms expansion at query time) and honestly I didn't understand > why this is not the default and implicit behavior. In other words, like > you, I can't imagine a scenario where I would a multi-terms synonym be > destructured in multiple OR clauses. > > Best, > Andrea > > > On 16/08/18 02:07, Roy Lim wrote: > >> I am not using edismax (eventually I would like to get there) but I'm just >> testing with standard query right now. Original posting: >> >> I'm trying to figure out why the multi-word synonym expansion is not >> working correctly (or, at least what I'm misunderstanding). Specifically, >> when I test a standard query with Solr Admin it appears to still split on >> whitespace. >> >> Here is my setup: >> - Solr 7.2.1 >> - synonym example: LCD => liquid crystal display >> - q=myfield:LCD >> - added parameter: sow=false >> - myfield schema looks like (analyzer both applicable to index and query >> time): >> >> > positionIncrementGap="100"> >> >> >> > synonyms="synonyms.txt"/> >> ... >> >> >> When debugging the query, Solr Admin shows the parsed query as: >> >> myfield:liquid myfield:crystal myfield:display >> >> >> (default operator being OR), as you can see it would incorrectly match on >> any of those words, but not all, which is what I would expect... >> >> Should it not do a phrase query search for the exact translated synonym, >> "liquid crystal display"? >> >> >> >> On Wed, Aug 15, 2018 at 5:01 PM, Doug Turnbull < >> dturnb...@opensourceconnections.com> wrote: >> >> Also share your fieldType settings for myfield as well from your schema >>> On Wed, Aug 15, 2018 at 8:00 PM Doug Turnbull < >>> dturnb...@opensourceconnections.com> wrote: >>> >>> Aside from the screenshot issue, one thing to check: are you searching >>>> with defType=edismax ? >>>> >>>> As in >>>> q=lcd=myfield=false=edismax >>>> >>>> ? >>>> >>>> Also sow=false should the the default on Solr 7 and above >>>> >>>> Doug >>>> >>>> On Wed, Aug 15, 2018 at 6:27 PM Roy Lim wrote: >>>> >>>> I'm trying to figure out why the multi-word synonym expansion is not >>>>> working >>>>> correctly. Specifically, when I test a standard query with Solr Admin >>>>> >>>> it >>> >>>> is >>>>> still splitting on whitespace. >>>>> >>>>> Here is my setup: >>>>> - Solr 7.2.1 >>>>> - synonym LCD => liquid crystal display >>>>> - q=myfield:LCD >>>>> - added: sow=false >>>>> - myfield looks like: >>>>> >>>>> >>>>> Solr Admin shows the parsed query looks like: >>>>> >>>>> myfield:liquid myfield:crystal myfield:display >>>>> >>>>> (default operator being OR), which would incorrectly match documents >>>>> >>>> with >>> >>>> any of those words, but not all, which is what I would expect... >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >>>>> >>>>> -- >>>> CTO, OpenSource Connections >>>> Author, Relevant Search >>>> http://o19s.com/doug >>>> >>>> -- >>> CTO, OpenSource Connections >>> Author, Relevant Search >>> http://o19s.com/doug >>> >>> >
Re: Multi-word Synonyms - how does sow parameter work?
Hi Roy, I think you miss the autoGeneratePhraseQueries=true in the field type definition. I was on a slightly different use case when I met your same issue (I was using synonyms expansion at query time) and honestly I didn't understand why this is not the default and implicit behavior. In other words, like you, I can't imagine a scenario where I would a multi-terms synonym be destructured in multiple OR clauses. Best, Andrea On 16/08/18 02:07, Roy Lim wrote: I am not using edismax (eventually I would like to get there) but I'm just testing with standard query right now. Original posting: I'm trying to figure out why the multi-word synonym expansion is not working correctly (or, at least what I'm misunderstanding). Specifically, when I test a standard query with Solr Admin it appears to still split on whitespace. Here is my setup: - Solr 7.2.1 - synonym example: LCD => liquid crystal display - q=myfield:LCD - added parameter: sow=false - myfield schema looks like (analyzer both applicable to index and query time): ... When debugging the query, Solr Admin shows the parsed query as: myfield:liquid myfield:crystal myfield:display (default operator being OR), as you can see it would incorrectly match on any of those words, but not all, which is what I would expect... Should it not do a phrase query search for the exact translated synonym, "liquid crystal display"? On Wed, Aug 15, 2018 at 5:01 PM, Doug Turnbull < dturnb...@opensourceconnections.com> wrote: Also share your fieldType settings for myfield as well from your schema On Wed, Aug 15, 2018 at 8:00 PM Doug Turnbull < dturnb...@opensourceconnections.com> wrote: Aside from the screenshot issue, one thing to check: are you searching with defType=edismax ? As in q=lcd=myfield=false=edismax ? Also sow=false should the the default on Solr 7 and above Doug On Wed, Aug 15, 2018 at 6:27 PM Roy Lim wrote: I'm trying to figure out why the multi-word synonym expansion is not working correctly. Specifically, when I test a standard query with Solr Admin it is still splitting on whitespace. Here is my setup: - Solr 7.2.1 - synonym LCD => liquid crystal display - q=myfield:LCD - added: sow=false - myfield looks like: Solr Admin shows the parsed query looks like: myfield:liquid myfield:crystal myfield:display (default operator being OR), which would incorrectly match documents with any of those words, but not all, which is what I would expect... -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html -- CTO, OpenSource Connections Author, Relevant Search http://o19s.com/doug -- CTO, OpenSource Connections Author, Relevant Search http://o19s.com/doug
Re: Multi-word Synonyms - how does sow parameter work?
I am not using edismax (eventually I would like to get there) but I'm just testing with standard query right now. Original posting: I'm trying to figure out why the multi-word synonym expansion is not working correctly (or, at least what I'm misunderstanding). Specifically, when I test a standard query with Solr Admin it appears to still split on whitespace. Here is my setup: - Solr 7.2.1 - synonym example: LCD => liquid crystal display - q=myfield:LCD - added parameter: sow=false - myfield schema looks like (analyzer both applicable to index and query time): ... When debugging the query, Solr Admin shows the parsed query as: myfield:liquid myfield:crystal myfield:display (default operator being OR), as you can see it would incorrectly match on any of those words, but not all, which is what I would expect... Should it not do a phrase query search for the exact translated synonym, "liquid crystal display"? On Wed, Aug 15, 2018 at 5:01 PM, Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > Also share your fieldType settings for myfield as well from your schema > On Wed, Aug 15, 2018 at 8:00 PM Doug Turnbull < > dturnb...@opensourceconnections.com> wrote: > > > Aside from the screenshot issue, one thing to check: are you searching > > with defType=edismax ? > > > > As in > > q=lcd=myfield=false=edismax > > > > ? > > > > Also sow=false should the the default on Solr 7 and above > > > > Doug > > > > On Wed, Aug 15, 2018 at 6:27 PM Roy Lim wrote: > > > >> I'm trying to figure out why the multi-word synonym expansion is not > >> working > >> correctly. Specifically, when I test a standard query with Solr Admin > it > >> is > >> still splitting on whitespace. > >> > >> Here is my setup: > >> - Solr 7.2.1 > >> - synonym LCD => liquid crystal display > >> - q=myfield:LCD > >> - added: sow=false > >> - myfield looks like: > >> > >> > >> Solr Admin shows the parsed query looks like: > >> > >> myfield:liquid myfield:crystal myfield:display > >> > >> (default operator being OR), which would incorrectly match documents > with > >> any of those words, but not all, which is what I would expect... > >> > >> > >> > >> > >> > >> -- > >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > >> > > -- > > CTO, OpenSource Connections > > Author, Relevant Search > > http://o19s.com/doug > > > -- > CTO, OpenSource Connections > Author, Relevant Search > http://o19s.com/doug >
Re: Multi-word Synonyms - how does sow parameter work?
Also share your fieldType settings for myfield as well from your schema On Wed, Aug 15, 2018 at 8:00 PM Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > Aside from the screenshot issue, one thing to check: are you searching > with defType=edismax ? > > As in > q=lcd=myfield=false=edismax > > ? > > Also sow=false should the the default on Solr 7 and above > > Doug > > On Wed, Aug 15, 2018 at 6:27 PM Roy Lim wrote: > >> I'm trying to figure out why the multi-word synonym expansion is not >> working >> correctly. Specifically, when I test a standard query with Solr Admin it >> is >> still splitting on whitespace. >> >> Here is my setup: >> - Solr 7.2.1 >> - synonym LCD => liquid crystal display >> - q=myfield:LCD >> - added: sow=false >> - myfield looks like: >> >> >> Solr Admin shows the parsed query looks like: >> >> myfield:liquid myfield:crystal myfield:display >> >> (default operator being OR), which would incorrectly match documents with >> any of those words, but not all, which is what I would expect... >> >> >> >> >> >> -- >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >> > -- > CTO, OpenSource Connections > Author, Relevant Search > http://o19s.com/doug > -- CTO, OpenSource Connections Author, Relevant Search http://o19s.com/doug
Re: Multi-word Synonyms - how does sow parameter work?
Aside from the screenshot issue, one thing to check: are you searching with defType=edismax ? As in q=lcd=myfield=false=edismax ? Also sow=false should the the default on Solr 7 and above Doug On Wed, Aug 15, 2018 at 6:27 PM Roy Lim wrote: > I'm trying to figure out why the multi-word synonym expansion is not > working > correctly. Specifically, when I test a standard query with Solr Admin it > is > still splitting on whitespace. > > Here is my setup: > - Solr 7.2.1 > - synonym LCD => liquid crystal display > - q=myfield:LCD > - added: sow=false > - myfield looks like: > > > Solr Admin shows the parsed query looks like: > > myfield:liquid myfield:crystal myfield:display > > (default operator being OR), which would incorrectly match documents with > any of those words, but not all, which is what I would expect... > > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > -- CTO, OpenSource Connections Author, Relevant Search http://o19s.com/doug
Re: Multi-word Synonyms - how does sow parameter work?
Yes please. That way we’ll see the whole thing. -- Steve www.lucidworks.com > On Aug 15, 2018, at 7:20 PM, Roy Lim wrote: > > I've subscribed, shall I re-post it then via email? > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Multi-word Synonyms - how does sow parameter work?
I've subscribed, shall I re-post it then via email? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Multi-word Synonyms - how does sow parameter work?
Roy, Not sure of the point of Nabble when it strips content before passing messages on to the mailing list. I’ve emailed them about this problem in the past but they have done nothing about it. Updating a post on Nabble will never make it to the mailing list. If you want us to be able to read your post in full, you should subscribe to the mailing list instead of using Nabble. Instructions here: http://lucene.apache.org/solr/community.html#solr-user-list-solr-userluceneapacheorg -- Steve www.lucidworks.com > On Aug 15, 2018, at 7:00 PM, Roy Lim wrote: > > Thanks, updated original post. It just removed what I surrounded with the > raw text markup, I've added it back without markup. Not sure of the point > of raw text if it's always removed > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Multi-word Synonyms - how does sow parameter work?
Thanks, updated original post. It just removed what I surrounded with the raw text markup, I've added it back without markup. Not sure of the point of raw text if it's always removed -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Multi-word Synonyms - how does sow parameter work?
The mail server strips pretty much all screenshots and attachments, so I think some of the data you're trying to provide is missing from the e-mail. Best, Erick On Wed, Aug 15, 2018 at 3:27 PM, Roy Lim wrote: > I'm trying to figure out why the multi-word synonym expansion is not working > correctly. Specifically, when I test a standard query with Solr Admin it is > still splitting on whitespace. > > Here is my setup: > - Solr 7.2.1 > - synonym LCD => liquid crystal display > - q=myfield:LCD > - added: sow=false > - myfield looks like: > > > Solr Admin shows the parsed query looks like: > > myfield:liquid myfield:crystal myfield:display > > (default operator being OR), which would incorrectly match documents with > any of those words, but not all, which is what I would expect... > > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Multi-word Synonyms - how does sow parameter work?
I'm trying to figure out why the multi-word synonym expansion is not working correctly. Specifically, when I test a standard query with Solr Admin it is still splitting on whitespace. Here is my setup: - Solr 7.2.1 - synonym LCD => liquid crystal display - q=myfield:LCD - added: sow=false - myfield looks like: Solr Admin shows the parsed query looks like: myfield:liquid myfield:crystal myfield:display (default operator being OR), which would incorrectly match documents with any of those words, but not all, which is what I would expect... -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Unable To Delete Managed Synonyms Containing a "/" In Solr 7.2
I THINK this might be a bug? I've had troubles with how the Solr Managed Synonym endpoint handles URL encoding of synonyms. It seems to be impossible to delete a synonym which has a forward slash in it. I have a synonym with a key of "Hot/Cold Pack" (that's the key that shows up when I GET the managed synonyms, as it appears in the JSON response). I've tried DELETE on several URLs, none of which work. Here's the sorts of URLs I've tried: 1. /synonyms/english/Hot%2FCold%20Pack - returns "Illegal character in path at index 84: http://10.74.222.14:8983/solr/ca_gm_search/schema/analysis/synonyms/english/Hot/Cold Pack" 2. /synonyms/english/Hot%252FCold%20Pack - returns "Illegal character in path at index 86: http://10.74.222.14:8983/solr/ca_gm_search/schema/analysis/synonyms/english/Hot%2FCold Pack" 3. /synonyms/english/Hot%252FCold%2520Pack - returns " No REST managed resource registered for path /schema/analysis/synonyms/english/Hot/Cold Pack" My blind guess is that Solr Managed Synonym endpoint is not properly decoding the request path. Either it stops decoding at %2F and complains because no synonym matches "Hot%2FCold Pack", or it decodes the term to "Hot/Cold Pack" and fails because it interprets "Hot" as a separate request path node. Should this be filed in the issue tracker or am I missing something? There doesn't appear to be a workaround for this. Once you insert a synonym with a forward slash, it's stuck for good (can't delete the endpoint and re-create it because it's not allowed if it is in use, and there is no bulk delete method). -- *Kyle Hipke* Software Engineer, Search and CMS Practices *CIRRUS**10* C: 206 316 9118 Cal: https://goo.gl/HwHA7K Website <http://www.cirrus10.com/?utm_source=signature_medium=email_content=website_link_campaign=Cirrus10_sig> | LinkedIn <http://www.linkedin.com/company/cirrus10> | Email
Re: synonyms question
Vicenzo, Thank you for the tip. I restarted Solr and it worked. -Ennio -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: synonyms question
Have you reloaded the core (or restarted Solr) after the change in the synonyms file? Ciao, Vincenzo -- mobile: 3498513251 skype: free.dev > On 17 Jul 2018, at 20:04, ennio wrote: > > No not using SolrCloud. > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: synonyms question
No not using SolrCloud. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: synonyms question
Ennio, do you know if you have SolrCloud? On Tue, Jul 17, 2018 at 7:19 PM ennio wrote: > Erick, > > I'm invoking the synonym at query time. > > Here is my fieldType definition. > > positionIncrementGap="100"> > > > > ignoreCase="true" > words="lang/stopwords_en.txt" > /> > > > protected="protwords.txt"/> > > > > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > ignoreCase="true" > words="lang/stopwords_en.txt" > /> > > > protected="protwords.txt"/> > > > > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > -- Vincenzo D'Amore
Re: synonyms question
Erick, I'm invoking the synonym at query time. Here is my fieldType definition. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: synonyms question
Hi Ennio, could you please share: * your configuration (specifically the field type declaration in your schema) * the query (please add debug=true) and the corresponding query response Best, Andrea On 17/07/18 17:35, Ennio Bozzetti wrote: I'm trying to get my synonyms to work, but for this one keyword I cannot get it to work. I added the following to my synonyms file. fiber,fibre But when I search for fiber or fibre it does not work. Fiber is the American English spelling and Fibre is the British English spelling. My field type is set to text_en would that be why? Thanks, Ennio Bozzetti Senior Web Programmer THORLABS (973) 300-2561 www.thorlabs.com<https://www.thorlabs.com/>
Re: synonyms question
You have to look at the analysis chain for text_en. Is the synonym factory being invoked? If so at indexing time or query time? If indexing time, did you have the synonym defined when you indexed the data originally? If in cloud mode did you push the configs to Zookeeper and reload the collection before indexing and/or querying? The admin UI>>collection>>analysis page is very helpful. Best, Erick On Tue, Jul 17, 2018 at 8:35 AM, Ennio Bozzetti wrote: > I'm trying to get my synonyms to work, but for this one keyword I cannot get > it to work. > > I added the following to my synonyms file. > > fiber,fibre > > But when I search for fiber or fibre it does not work. > > Fiber is the American English spelling and Fibre is the British English > spelling. > > My field type is set to text_en would that be why? > > Thanks, > > Ennio Bozzetti > Senior Web Programmer > THORLABS > (973) 300-2561 > www.thorlabs.com<https://www.thorlabs.com/> >
synonyms question
I'm trying to get my synonyms to work, but for this one keyword I cannot get it to work. I added the following to my synonyms file. fiber,fibre But when I search for fiber or fibre it does not work. Fiber is the American English spelling and Fibre is the British English spelling. My field type is set to text_en would that be why? Thanks, Ennio Bozzetti Senior Web Programmer THORLABS (973) 300-2561 www.thorlabs.com<https://www.thorlabs.com/>
Re: Problem with synonyms containing whitespace
thanks for the solution its working fine for me. I did the same configuration but missed the tokenizerFactory="solr.KeywordTokenizerFactory" in the filter tag. that great -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms
Yay! I'm glad the UnifiedHighlighter is serving you well. I was about to suggest it. If you think the fragmentation/snippeting could be improved in a general way then post a JIRA for consideration. Note: identical results with the original Highlighter is a non-goal. On Mon, Apr 23, 2018 at 10:14 PM howed <david.h...@auspost.com.au> wrote: > Finally got back to looking at this, and found that the solution was to > switch to the unified > < > https://lucene.apache.org/solr/guide/7_2/highlighting.html#choosing-a-highlighter> > > highlighter which doesn't seem to have the same problem with my complex > synonyms. This required some tweaking of the highlighting parameters and > my > code as it doesn't highlight exactly the same as the default highlighter, > but all is working now. > > Thanks again for the assistance. > > David > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
RE: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms
Finally got back to looking at this, and found that the solution was to switch to the unified <https://lucene.apache.org/solr/guide/7_2/highlighting.html#choosing-a-highlighter> highlighter which doesn't seem to have the same problem with my complex synonyms. This required some tweaking of the highlighting parameters and my code as it doesn't highlight exactly the same as the default highlighter, but all is working now. Thanks again for the assistance. David -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: PF, PF2, PF3 clauses missing in solr7 with query-time synonyms?
An update on this: The problem occurs on phrase queries, using edismax, where the term in the nested query contains a multi-word synonym. In the example above, dog has a multiterm synonym "canis familiaris", and aspirin has "acetylsalicylic acid". Creating a JIRA ticket. Thank you, Elizabeth On Wed, Apr 18, 2018 at 12:38 PM, Elizabeth Haubert < ehaub...@opensourceconnections.com> wrote: > I'm seeing pf and pf3 clauses fail to generate in long queries containing > synonyms. Wondering if anyone else has run into this, or if it needs to be > submitted as a bug in Jira. It is a showstopper problem for the current > project, as the pf and pf3 were pretty heavily tuned. > > Using Solr 7.1; all fields are using the following type: > > With query-time synonyms: > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > > pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/> > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" > stemEnglishPossessive="1" protected="protwords_wdff.txt"/> > words="stopwords.txt" /> > > > > > protected="protwords_nostem.txt"/> > > > > > > pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/> > > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" > stemEnglishPossessive="1" protected="protwords_wdff.txt"/> > words="stopwords.txt" /> > > > > > managed="synonyms_all" /> > protected="protwords_nostem.txt"/> > > > > > > Without query-time synonyms: > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > > pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/> > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" > stemEnglishPossessive="1" protected="protwords_wdff.txt"/> > words="stopwords.txt" /> > > > > > managed="synonyms_all" /> > protected="protwords_nostem.txt"/> > > > > > > pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/> > > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" > stemEnglishPossessive="1" protected="protwords_wdff.txt"/> > words="stopwords.txt" /> > > > > > protected="protwords_nostem.txt"/> > > > > > > Synonyms file is pretty long, so I'll just include the relevent bits for > an example: > > allergic, hypersensitive > aspirin, acetylsalicylic acid > dog, canine, canis familiris, k 9 > rat, rattus > > > The problem seems to occur when part of the query has a synonym, but the > whole phrase is not. Whitespace added to piece out what is going on; > believe any parentheses errors are due to my tinkering around. Beyond that > though, this is as from Solr. Slop has been tinkered with to identify > PF/PF2/PF3 clauses where PF fields have a slop ending in 0, pf2 ending in > 1, pf3 ending in 2 eg ~10, ~11, ~12, etc. > > = > Example 1: "aspirin dose in rats" > == > > With query-time synonyms: > === > /// Q terms generate as expected /// > +kw1:\"acetylsalicylic acid\" kw1:aspirin)^100.0 | > (species:\"acetylsalicylic acid\" species:aspirin) | > (keywords_bm25_no_norms:\"acetylsalicylic acid\" > keywords_bm25_no_norms:aspirin)^50.0 > | (description:\"acetylsalicylic acid\" description:aspirin) | > (kw1ranked:\"acetylsalicylic acid\" kw1ranked:aspirin)^100.0 | > (text:\"acetylsalicylic acid\" text:aspirin) | (title:\"acetylsalicylic > acid\" title:aspirin)^100.0 | (keywordsranked_bm25_no_norms:\"acetylsalicylic > acid\" keywordsranked_bm25_no_norms:aspirin)^50.0 | > (authors:\"acetylsalicylic acid\" authors:aspirin))~0.4 > ((Synonym(kw1:dosage kw1:dose kw1:dose kw1:dose))^100.0 | > Synonym(species:d
PF, PF2, PF3 clauses missing in solr7 with query-time synonyms?
I'm seeing pf and pf3 clauses fail to generate in long queries containing synonyms. Wondering if anyone else has run into this, or if it needs to be submitted as a bug in Jira. It is a showstopper problem for the current project, as the pf and pf3 were pretty heavily tuned. Using Solr 7.1; all fields are using the following type: With query-time synonyms: Without query-time synonyms: Synonyms file is pretty long, so I'll just include the relevent bits for an example: allergic, hypersensitive aspirin, acetylsalicylic acid dog, canine, canis familiris, k 9 rat, rattus The problem seems to occur when part of the query has a synonym, but the whole phrase is not. Whitespace added to piece out what is going on; believe any parentheses errors are due to my tinkering around. Beyond that though, this is as from Solr. Slop has been tinkered with to identify PF/PF2/PF3 clauses where PF fields have a slop ending in 0, pf2 ending in 1, pf3 ending in 2 eg ~10, ~11, ~12, etc. = Example 1: "aspirin dose in rats" == With query-time synonyms: === /// Q terms generate as expected /// +kw1:\"acetylsalicylic acid\" kw1:aspirin)^100.0 | (species:\"acetylsalicylic acid\" species:aspirin) | (keywords_bm25_no_norms:\"acetylsalicylic acid\" keywords_bm25_no_norms:aspirin)^50.0 | (description:\"acetylsalicylic acid\" description:aspirin) | (kw1ranked:\"acetylsalicylic acid\" kw1ranked:aspirin)^100.0 | (text:\"acetylsalicylic acid\" text:aspirin) | (title:\"acetylsalicylic acid\" title:aspirin)^100.0 | (keywordsranked_bm25_no_norms:\"acetylsalicylic acid\" keywordsranked_bm25_no_norms:aspirin)^50.0 | (authors:\"acetylsalicylic acid\" authors:aspirin))~0.4 ((Synonym(kw1:dosage kw1:dose kw1:dose kw1:dose))^100.0 | Synonym(species:dosage species:dose species:dose species:dose) | (Synonym(keywords_bm25_no_norms:dosage keywords_bm25_no_norms:dose keywords_bm25_no_norms:dose keywords_bm25_no_norms:dose))^50.0 | Synonym(description:dosage description:dose description:dose description:dose) | (Synonym(kw1ranked:dosage kw1ranked:dose kw1ranked:dose kw1ranked:dose))^100.0 | Synonym(text:dosage text:dose text:dose text:dose) | (Synonym(title:dosage title:dose title:dose title:dose))^100.0 | (Synonym(keywordsranked_bm25_no_norms:dosage keywordsranked_bm25_no_norms:dose keywordsranked_bm25_no_norms:dose keywordsranked_bm25_no_norms:dose))^50.0 | Synonym(authors:dosage authors:dose authors:dose authors:dose))~0.4 ((Synonym(kw1:rat kw1:rattu))^100.0 | Synonym(species:rat species:rattu) | (Synonym(keywords_bm25_no_norms:rat keywords_bm25_no_norms:rattu))^50.0 | Synonym(description:rat description:rattu) | (Synonym(kw1ranked:rat kw1ranked:rattu))^100.0 | Synonym(text:rat text:rattu) | (Synonym(title:rat title:rattu))^100.0 | (Synonym(keywordsranked_bm25_no_norms:rat keywordsranked_bm25_no_norms:rattu))^50.0 | Synonym(authors:rat authors:rattu))~0.4)~3) /// PF and PF2 are missing. /// () () () () () /// This is actually PF3 with a missing ? where the stopword 'in' belonged. /// ((title:\"(dosage dose dose dose) (rattu rat)\"~22)^1000.0 | (keywordsranked_bm25_no_norms:\"(dosage dose dose dose) (rattu rat)\"~22)^1000.0 | (text:\"(dosage dose dose dose) (rattu rat)\"~22)^100.0)~0.4 ((keywords_bm25_no_norms:\"(dosage dose dose dose) (rattu rat)\"~12)^500.0 | (kw1ranked:\"(dosage dose dose dose) (rattu rat)\"~12)^100.0 | (kw1:\"(dosage dose dose dose) (rattu rat)\"~12)^100.0)~0.4,product(max(10.0/(3.16E-11*float(ms(const(14560),date(dateint)))+6.0),int(documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))", With index-time synonyms: === /// Q /// "boost(+kw1:aspirin)^100.0 | species:aspirin | (keywords_bm25_no_norms:aspirin)^50.0 | description:aspirin | (kw1ranked:aspirin)^100.0 | text:aspirin | (title:aspirin)^100.0 | (keywordsranked_bm25_no_norms:aspirin)^50.0 | authors:aspirin)~0.4 ((kw1:dose)^100.0 | species:dose | (keywords_bm25_no_norms:dose)^50.0 | description:dose | (kw1ranked:dose)^100.0 | text:dose | (title:dose)^100.0 | (keywordsranked_bm25_no_norms:dose)^50.0 | authors:dose)~0.4 ((kw1:rats)^100.0 | species:rats | (keywords_bm25_no_norms:rats)^50.0 | description:rats | (kw1ranked:rats)^100.0 | text:rats | (title:rats)^100.0 | (keywordsranked_bm25_no_norms:rats)^50.0 | authors:rats)~0.4)~3) /// PF /// ((title:\"aspirin dose ? rats\"~20)^5000.0 | (keywordsranked_bm25_no_norms:\"aspirin dose ? rats\"~20)^5000.0 | (keywords_bm25_no_norms:\"aspirin dose ? rats\"~20)^1500.0 | (text:\"aspirin dose ? rats\"~20)^1000.0)~0.4 ((kw1ranked:\"aspirin dose ? rats\"~10)^5000.0 | (kw1:\"aspirin dose ? rats\"~10)^500.0)~0.4 ((authors:\"aspirin dose
RE: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms
David Yes, highlighting is tricky, especially with synonyms. Sorry, I would need to see a bit more of your config before saying more about it. Thanks -- Rick -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
RE: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms
Hi Rick, Thanks for your response. The reason that we do it like this is that the localities are also part of another indexed field that contains the entire address. We actually do the search over that field, and we are only using the highlighting on the problematic field so that we can tell which parts of the address that we matched to. We never search for wildcards like "*cannum*". As an example, we might have an address that we index which is "19 some st cannum vic 3456". When we index the address, we actually index the text "19 some st lcx__balmoral__cannum__clear_lake__lower_norton vic 3456" into a Solr field that has our custom synonym filter. This then causes the synonyms for the locality "cannum" to be generated, and if we search for "19 some st balmoral" we will still get a match on the locality component of the address. Using this method, the searching for addresses is working fine. We have a requirement once we have a match to know which part of the address that we matched to, which is where the highlighting comes in. By loading just the locality part of the address into a separate field and applying the same synonym filter, through the highlighting we can see if we get a hit on the locality. We do this with the other components of the address, like the number, the street name, the street type, the post code etc. so that we can return to the caller what bits of their input matched to the address we are returning. I could load them as a multi-valued field for just the highlighting, but that means I need to extract them in a different format to what I am using for the whole address which I would like to avoid if possible. We are loading these addresses from a database table using the data import handler. Regards, David David Howe Java Domain Architect Postal Systems Level 16, 111 Bourke Street Melbourne VIC 3000 T 0391067904 M 0424036591 E david.h...@auspost.com.au W auspost.com.au W startrack.com.au Australia Post is committed to providing our customers with excellent service. If we can assist you in any way please telephone 13 13 18 or visit our website. The information contained in this email communication may be proprietary, confidential or legally professionally privileged. It is intended exclusively for the individual or entity to which it is addressed. You should only read, disclose, re-transmit, copy, distribute, act in reliance on or commercialise the information if you are authorised to do so. Australia Post does not represent, warrant or guarantee that the integrity of this email communication has been maintained nor that the communication is free of errors, virus or interference. If you are not the addressee or intended recipient please notify us by replying direct to the sender and then destroy any electronic or paper copy of this message. Any views expressed in this email communication are taken to be those of the individual sender, except where the sender specifically attributes those views to Australia Post and is authorised to do so. Please consider the environment before printing this email.
Re: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms
David When you have "lcx__balmoral__cannum__clear_lake__lower_norton" in a field, would you search for *cannum* ? That might not perform well. Why not have a multivalue field for this information? It could be that you have a good reason for this, and I just do not understand. Cheers -- Rick -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Using dynamic synonyms file
Hi, Is it possible to specify the synonyms file as a variable, set a default synonym file and passing the file name from the request? If so, is there an example of this? Such as, Thanks, Roopa
Re: Using Synonyms as a feature with LTR
I see okay, thank you. On Wed, Feb 14, 2018 at 10:34 AM, Alessandro Benedettiwrote: > I see, > According to what I know it is not possible to run for the same field > different query time analysis. > > Not sure if anyone was working on that. > > Regards > > > > - > --- > Alessandro Benedetti > Search Consultant, R Software Engineer, Director > Sease Ltd. - www.sease.io > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Using Synonyms as a feature with LTR
I see, According to what I know it is not possible to run for the same field different query time analysis. Not sure if anyone was working on that. Regards - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Using Synonyms as a feature with LTR
So, I would end up with ~6 copy fields with ~8 synonym files so that would be about 48 field/synonym combination. Would that be a significant in terms of index size. What would be the best way to measure this? Custom parser: This would take the file name, field to run the analysis on. This field need not be a copy field which holds data, since we can use this is only for getting the analysis. Get the synonyms for the user query as tokens. Create a edismax query based on the query tokens. Return the score This custom parser would be called in LTR as a scalar feature. I am at the stage I can get the synonyms from the analysis chain, however tokens are individual tokens and not phrases. So, I am stuck at how to construct a correct query based on the synonym tokens and positions. Thank you, Roopa On Wed, Feb 14, 2018 at 10:12 AM, Roopa Rao <roop...@gmail.com> wrote: > So, I would end up with ~6 copy fields with ~8 synonym files so that would > be about 48 field/synonym combination. Would that be a significant in terms > of index size. I guess that depends on the thesaurus size, what would be > the best way to measure this? > > Custom parser: > This would take the file name, field to run the analysis on. This field > need not be a copy field which holds data, since we can use this is only > for getting the analysis. > Get the synonyms for the user query as tokens. > Create a edismax query based on the query tokens. > Return the score > > This custom parser would be called in LTR as a scalar feature. > > I am at the stage I can get the synonyms from the analysis chain, however > tokens are individual tokens and not phrases. So, I am stuck at how to > construct a correct query based on the synonym tokens and positions. > > Thank you, > Roopa > > > > On Wed, Feb 14, 2018 at 5:23 AM, Alessandro Benedetti < > a.benede...@sease.io> wrote: > >> "I can go with the "title" field and have that include the synonyms in >> analysis. Only problem is that the number of fields and number of synonyms >> files are quite a lot (~ 8 synonyms files) due to different weightage and >> type of expansion (exact vs partial) based on these. Hence going with this >> approach would mean creating more fields for all these synonyms >> (synonyms.txt) >> >> So, I am looking to build a custom parser for which I could supply the >> file >> and the field and that would expand the synonyms and return a score. " >> >> Having a binary or scalar feature is completely up to you and the way you >> configure the Solr feature. >> If you have 8 (copy?)fields with same content but different expansion, >> that >> is still ok. >> You can have 8 features, one per type of expansion. >> LTR will take care of the weight to be assigned to those features. >> >> "So, I am looking to build a custom parser for which I could supply the >> file >> and the field and that would expand the synonyms and return a score. "" >> I don't get this , can you elaborate ? >> >> Regards >> >> >> >> - >> --- >> Alessandro Benedetti >> Search Consultant, R Software Engineer, Director >> Sease Ltd. - www.sease.io >> -- >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >> > >
Re: Using Synonyms as a feature with LTR
So, I would end up with ~6 copy fields with ~8 synonym files so that would be about 48 field/synonym combination. Would that be a significant in terms of index size. I guess that depends on the thesaurus size, what would be the best way to measure this? Custom parser: This would take the file name, field to run the analysis on. This field need not be a copy field which holds data, since we can use this is only for getting the analysis. Get the synonyms for the user query as tokens. Create a edismax query based on the query tokens. Return the score This custom parser would be called in LTR as a scalar feature. I am at the stage I can get the synonyms from the analysis chain, however tokens are individual tokens and not phrases. So, I am stuck at how to construct a correct query based on the synonym tokens and positions. Thank you, Roopa On Wed, Feb 14, 2018 at 5:23 AM, Alessandro Benedetti <a.benede...@sease.io> wrote: > "I can go with the "title" field and have that include the synonyms in > analysis. Only problem is that the number of fields and number of synonyms > files are quite a lot (~ 8 synonyms files) due to different weightage and > type of expansion (exact vs partial) based on these. Hence going with this > approach would mean creating more fields for all these synonyms > (synonyms.txt) > > So, I am looking to build a custom parser for which I could supply the file > and the field and that would expand the synonyms and return a score. " > > Having a binary or scalar feature is completely up to you and the way you > configure the Solr feature. > If you have 8 (copy?)fields with same content but different expansion, that > is still ok. > You can have 8 features, one per type of expansion. > LTR will take care of the weight to be assigned to those features. > > "So, I am looking to build a custom parser for which I could supply the > file > and the field and that would expand the synonyms and return a score. "" > I don't get this , can you elaborate ? > > Regards > > > > - > --- > Alessandro Benedetti > Search Consultant, R Software Engineer, Director > Sease Ltd. - www.sease.io > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Using Synonyms as a feature with LTR
"I can go with the "title" field and have that include the synonyms in analysis. Only problem is that the number of fields and number of synonyms files are quite a lot (~ 8 synonyms files) due to different weightage and type of expansion (exact vs partial) based on these. Hence going with this approach would mean creating more fields for all these synonyms (synonyms.txt) So, I am looking to build a custom parser for which I could supply the file and the field and that would expand the synonyms and return a score. " Having a binary or scalar feature is completely up to you and the way you configure the Solr feature. If you have 8 (copy?)fields with same content but different expansion, that is still ok. You can have 8 features, one per type of expansion. LTR will take care of the weight to be assigned to those features. "So, I am looking to build a custom parser for which I could supply the file and the field and that would expand the synonyms and return a score. "" I don't get this , can you elaborate ? Regards - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Using Synonyms as a feature with LTR
Thank you, Alessandro, I was trying these options before replying. Yes, I am looking to generate a score for a query with synonym expansion (not binary feature) I can go with the "title" field and have that include the synonyms in analysis. Only problem is that the number of fields and number of synonyms files are quite a lot (~ 8 synonyms files) due to different weightage and type of expansion (exact vs partial) based on these. Hence going with this approach would mean creating more fields for all these synonyms (synonyms.txt) So, I am looking to build a custom parser for which I could supply the file and the field and that would expand the synonyms and return a score. Thanks, Roopa On Mon, Feb 12, 2018 at 6:23 AM, Alessandro Benedetti <a.benede...@sease.io> wrote: > In the end a feature will just be a numerical value. > How do you plan to use synonyms in a field to generate a numerical feature > ? > > Are you planning to define a binary feature for a field, in case there is a > match on the synonyms ? > Or a feature which contains a score for a query ( with synonyms expansion) > ? > > I would start from the SolrFeature, let's assume the "title" field has a > field type that includes synonyms ( query time) : > > { > "store" : "featureStore", > "name" : "hasTitleMatch", > "class" : "org.apache.solr.ltr.feature.SolrFeature", > "params" : { > "fq": [ "{!field f=title}${query}" ] > } > > Query time analysis will be applied and synonyms expanded. > So the feature will have a value , which is the score returned for the > query > and the document ( under scoring) . > You can play with that and design the feature that best fit your idea. > > Regards > > > > > > > > > - > --- > Alessandro Benedetti > Search Consultant, R Software Engineer, Director > Sease Ltd. - www.sease.io > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Using Synonyms as a feature with LTR
In the end a feature will just be a numerical value. How do you plan to use synonyms in a field to generate a numerical feature ? Are you planning to define a binary feature for a field, in case there is a match on the synonyms ? Or a feature which contains a score for a query ( with synonyms expansion) ? I would start from the SolrFeature, let's assume the "title" field has a field type that includes synonyms ( query time) : { "store" : "featureStore", "name" : "hasTitleMatch", "class" : "org.apache.solr.ltr.feature.SolrFeature", "params" : { "fq": [ "{!field f=title}${query}" ] } Query time analysis will be applied and synonyms expanded. So the feature will have a value , which is the score returned for the query and the document ( under scoring) . You can play with that and design the feature that best fit your idea. Regards - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Multi words query time synonyms
Steve, According to your comment, I made this test : 1/ put the SynonymGraphFilterFactory after the StopFilterFactory in query time analyze chain 2/ remove the stop word in the synonyms file om, olympique marseille The parsed query string are : for "om maillot" "parsedquery_toString":"+(+name_text_gp:olympiqu +name_text_gp:marseil) name_text_gp:om)) (name_text_gp:maillot))~1)", for "olympique de marseille maillot" "parsedquery_toString":"+name_text_gp:om (+name_text_gp:olympiqu +name_text_gp:marseil))) (name_text_gp:maillot))~1)", for "maillot om" parsedquery_toString":"+(((name_text_gp:maillot) (((+name_text_gp:olympiqu +name_text_gp:marseil) name_text_gp:om)))~1)", for "maillot olympique de marseille" "parsedquery_toString":"+(((name_text_gp:maillot) ((name_text_gp:om (+name_text_gp:olympiqu +name_text_gp:marseil~1)", The query result are the same for all queries. It looks like this could be an acceptable workaround. Thank you Dominique Le dim. 11 févr. 2018 à 10:31, Dominique Bejean <dominique.bej...@eolya.fr> a écrit : > Hi Steve, > > Thank you for your response. > The Jira was created : SOLR-11968 > > I let you add your comments. > > Regards. > > Dominique > > > Le sam. 10 févr. 2018 à 20:30, Steve Rowe <sar...@gmail.com> a écrit : > >> Hi Dominique, >> >> Looks like it’s a bug, not sure where exactly though. Can you please >> create a JIRA? >> >> I can see the same behavior on master too, not just on the >> releases/lucene-solr/6.6.2 tag. >> >> One interesting thing I found is that if I remove the stop filter from >> the query analyzer, I get the following for qq=“maillot om”: >> >> +((name_text_gp:maillot) (((+name_text_gp:olympiqu +name_text_gp:de >> +name_text_gp:marseil) name_text_gp:om))) >> >> (btw my stop list only has “de” on it) >> >> Thanks, >> >> -- >> Steve >> www.lucidworks.com >> >> > On Feb 10, 2018, at 2:12 AM, Dominique Bejean < >> dominique.bej...@eolya.fr> wrote: >> > >> > Hi, >> > >> > More info. >> > >> > When I test the analisys for the field type the synonyms are correctly >> > expanded for both expressions >> > >> > om maillot >> > maillot om >> > olympique de marseille maillot >> > maillot olympique de marseille >> > >> > resulting outputs always include the following terms (obvioulsly not >> always >> > in the same order) >> > >> > olympiqu om marseil maillot >> > >> > >> > So, i suspect an issue with edismax query parser. >> > >> > Regards. >> > >> > Dominique >> > >> > >> > Le ven. 9 févr. 2018 à 18:25, Dominique Bejean < >> dominique.bej...@eolya.fr> >> > a écrit : >> > >> >> Hi, >> >> >> >> I am trying multi words query time synonyms with Solr 6.6.2and >> >> SynonymGraphFilterFactory filter as explain in this article >> >> >> >> >> https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/ >> >> >> >> My field type is : >> >> >> >> > >> positionIncrementGap="100"> >> >> >> >> >> >> > >>articles="lang/contractions_fr.txt"/> >> >> >> >> >> >> > >> ignoreCase="true"/> >> >> >> >> >> >> >> >> >> >> > >>articles="lang/contractions_fr.txt"/> >> >> >> >> > >> synonyms="synonyms.txt" >> >>ignoreCase="true" expand="true"/> >> >> >> >> > >> ignoreCase="true"/> >> >> >> >> >> >> >> >> >> >> >> >> synonyms.txt contains the line >> >> >> >> om, olympique de marseille >> >> >> >> >> >> The order of words in my query has an impact on the generated query in >> >> edismax >> >> >> >> q={!edismax qf='name_text_gp' v=$qq} >> >> =false >> >> =... >> >> >> >> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see >> the >> >> synonyms expansion. It is working as expected. >> >> >> >> "parsedquery_toString":"+(((+name_text_gp:olympiqu >> +name_text_gp:marseil >> >> +name_text_gp:maillot) name_text_gp:om))", >> >> "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu >> >> +name_text_gp:marseil +name_text_gp:maillot)))", >> >> >> >> >> >> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see >> the >> >> same generated query >> >> >> >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))", >> >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))", >> >> >> >> I don't understand these generated queries. The first one looks like >> the >> >> synonym expansion is ignored, but the second one shows it is not >> ignored >> >> and only the synonym term is used. >> >> >> >> >> >> What is wrong in the way I am doing this ? >> >> >> >> Regards >> >> >> >> Dominique >> >> >> >> -- >> >> Dominique Béjean >> >> 06 08 46 12 43 >> >> >> > -- >> > Dominique Béjean >> > 06 08 46 12 43 >> >> -- > Dominique Béjean > 06 08 46 12 43 > -- Dominique Béjean 06 08 46 12 43
Re: Multi words query time synonyms
Hi Steve, Thank you for your response. The Jira was created : SOLR-11968 I let you add your comments. Regards. Dominique Le sam. 10 févr. 2018 à 20:30, Steve Rowe <sar...@gmail.com> a écrit : > Hi Dominique, > > Looks like it’s a bug, not sure where exactly though. Can you please > create a JIRA? > > I can see the same behavior on master too, not just on the > releases/lucene-solr/6.6.2 tag. > > One interesting thing I found is that if I remove the stop filter from the > query analyzer, I get the following for qq=“maillot om”: > > +((name_text_gp:maillot) (((+name_text_gp:olympiqu +name_text_gp:de > +name_text_gp:marseil) name_text_gp:om))) > > (btw my stop list only has “de” on it) > > Thanks, > > -- > Steve > www.lucidworks.com > > > On Feb 10, 2018, at 2:12 AM, Dominique Bejean <dominique.bej...@eolya.fr> > wrote: > > > > Hi, > > > > More info. > > > > When I test the analisys for the field type the synonyms are correctly > > expanded for both expressions > > > > om maillot > > maillot om > > olympique de marseille maillot > > maillot olympique de marseille > > > > resulting outputs always include the following terms (obvioulsly not > always > > in the same order) > > > > olympiqu om marseil maillot > > > > > > So, i suspect an issue with edismax query parser. > > > > Regards. > > > > Dominique > > > > > > Le ven. 9 févr. 2018 à 18:25, Dominique Bejean < > dominique.bej...@eolya.fr> > > a écrit : > > > >> Hi, > >> > >> I am trying multi words query time synonyms with Solr 6.6.2and > >> SynonymGraphFilterFactory filter as explain in this article > >> > >> > https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/ > >> > >> My field type is : > >> > >> >> positionIncrementGap="100"> > >> > >> > >> >>articles="lang/contractions_fr.txt"/> > >> > >> > >> >> ignoreCase="true"/> > >> > >> > >> > >> > >> >>articles="lang/contractions_fr.txt"/> > >> > >> >> synonyms="synonyms.txt" > >>ignoreCase="true" expand="true"/> > >> > >> >> ignoreCase="true"/> > >> > >> > >> > >> > >> > >> synonyms.txt contains the line > >> > >> om, olympique de marseille > >> > >> > >> The order of words in my query has an impact on the generated query in > >> edismax > >> > >> q={!edismax qf='name_text_gp' v=$qq} > >> =false > >> =... > >> > >> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see > the > >> synonyms expansion. It is working as expected. > >> > >> "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil > >> +name_text_gp:maillot) name_text_gp:om))", > >> "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu > >> +name_text_gp:marseil +name_text_gp:maillot)))", > >> > >> > >> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see > the > >> same generated query > >> > >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))", > >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))", > >> > >> I don't understand these generated queries. The first one looks like the > >> synonym expansion is ignored, but the second one shows it is not ignored > >> and only the synonym term is used. > >> > >> > >> What is wrong in the way I am doing this ? > >> > >> Regards > >> > >> Dominique > >> > >> -- > >> Dominique Béjean > >> 06 08 46 12 43 > >> > > -- > > Dominique Béjean > > 06 08 46 12 43 > > -- Dominique Béjean 06 08 46 12 43
Re: Multi words query time synonyms
Hi Dominique, Looks like it’s a bug, not sure where exactly though. Can you please create a JIRA? I can see the same behavior on master too, not just on the releases/lucene-solr/6.6.2 tag. One interesting thing I found is that if I remove the stop filter from the query analyzer, I get the following for qq=“maillot om”: +((name_text_gp:maillot) (((+name_text_gp:olympiqu +name_text_gp:de +name_text_gp:marseil) name_text_gp:om))) (btw my stop list only has “de” on it) Thanks, -- Steve www.lucidworks.com > On Feb 10, 2018, at 2:12 AM, Dominique Bejean <dominique.bej...@eolya.fr> > wrote: > > Hi, > > More info. > > When I test the analisys for the field type the synonyms are correctly > expanded for both expressions > > om maillot > maillot om > olympique de marseille maillot > maillot olympique de marseille > > resulting outputs always include the following terms (obvioulsly not always > in the same order) > > olympiqu om marseil maillot > > > So, i suspect an issue with edismax query parser. > > Regards. > > Dominique > > > Le ven. 9 févr. 2018 à 18:25, Dominique Bejean <dominique.bej...@eolya.fr> > a écrit : > >> Hi, >> >> I am trying multi words query time synonyms with Solr 6.6.2and >> SynonymGraphFilterFactory filter as explain in this article >> >> https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/ >> >> My field type is : >> >> > positionIncrementGap="100"> >> >> >> > articles="lang/contractions_fr.txt"/> >> >> >> > ignoreCase="true"/> >> >> >> >> >> >articles="lang/contractions_fr.txt"/> >> >> > synonyms="synonyms.txt" >>ignoreCase="true" expand="true"/> >> >> > ignoreCase="true"/> >> >> >> >> >> >> synonyms.txt contains the line >> >> om, olympique de marseille >> >> >> The order of words in my query has an impact on the generated query in >> edismax >> >> q={!edismax qf='name_text_gp' v=$qq} >> =false >> =... >> >> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the >> synonyms expansion. It is working as expected. >> >> "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil >> +name_text_gp:maillot) name_text_gp:om))", >> "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu >> +name_text_gp:marseil +name_text_gp:maillot)))", >> >> >> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the >> same generated query >> >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))", >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))", >> >> I don't understand these generated queries. The first one looks like the >> synonym expansion is ignored, but the second one shows it is not ignored >> and only the synonym term is used. >> >> >> What is wrong in the way I am doing this ? >> >> Regards >> >> Dominique >> >> -- >> Dominique Béjean >> 06 08 46 12 43 >> > -- > Dominique Béjean > 06 08 46 12 43
Re: Multi words query time synonyms
Hi, More info. When I test the analisys for the field type the synonyms are correctly expanded for both expressions om maillot maillot om olympique de marseille maillot maillot olympique de marseille resulting outputs always include the following terms (obvioulsly not always in the same order) olympiqu om marseil maillot So, i suspect an issue with edismax query parser. Regards. Dominique Le ven. 9 févr. 2018 à 18:25, Dominique Bejean <dominique.bej...@eolya.fr> a écrit : > Hi, > > I am trying multi words query time synonyms with Solr 6.6.2and > SynonymGraphFilterFactory filter as explain in this article > > https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/ > > My field type is : > > positionIncrementGap="100"> > > >articles="lang/contractions_fr.txt"/> > > >ignoreCase="true"/> > > > > >articles="lang/contractions_fr.txt"/> > >synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > >ignoreCase="true"/> > > > > > > synonyms.txt contains the line > > om, olympique de marseille > > > The order of words in my query has an impact on the generated query in > edismax > > q={!edismax qf='name_text_gp' v=$qq} > =false > =... > > with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the > synonyms expansion. It is working as expected. > > "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil > +name_text_gp:maillot) name_text_gp:om))", > "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu > +name_text_gp:marseil +name_text_gp:maillot)))", > > > with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the > same generated query > > "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))", > "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))", > > I don't understand these generated queries. The first one looks like the > synonym expansion is ignored, but the second one shows it is not ignored > and only the synonym term is used. > > > What is wrong in the way I am doing this ? > > Regards > > Dominique > > -- > Dominique Béjean > 06 08 46 12 43 > -- Dominique Béjean 06 08 46 12 43