subject:"synonyms"

Re: Multiword synonyms and term wildcards/substring matching

2021-03-02 Thread Martin Graney

Hi Alex

Thanks for the reply.
We are not using the 'copyField bucket' approach as it is inflexible. Our
textual fields are all multivalued dynamic fields, which allows us to craft
a list of `pf` (phrase fields) with associated weighting boosts that are
meant to be used in the search on a *per-collection* basis. This allows us
to have all of the textual fields indexed independently and then simply
change the query when we want to include/exclude a field from the search
without the need to reindex the entire collection. e/dismax makes this more
flexible approach possible.

I'll take a look at the ComplexQueryParser and see if it is a good fit.
We use a lot of the e/dismax params though, such as `bf` (boost functions),
`bq` (boost queries), and 'pf' (phrase fields), to influence the relevance
score.

FYI: We are using Solr 8.3.

On Tue, 2 Mar 2021 at 13:38, Alexandre Rafalovitch 
wrote:

> I admit to not fully understanding the examples, but ComplexQueryParser
> looks like something worth at least reviewing:
>
>
> https://lucene.apache.org/solr/guide/8_8/other-parsers.html#complex-phrase-query-parser
>
> Also I did not see any references to trying to copyField and process same
> content in different ways. If copyField is not stored, the overhead is not
> as large.
>
> Regards,
> Alex
>
>
>
> On Tue., Mar. 2, 2021, 7:08 a.m. Martin Graney, 
> wrote:
>
> > Hi All
> >
> > I have been trying to implement multi word synonyms using `sow=false`
> into
> > a pre-existing system that applied pre-processing to the phrase to apply
> > wildcards around the terms, i.e. `bread stick` => `*bread* *stick*`.
> >
> > I got the synonyms expansion working perfectly, after discovering the
> > `preserveOriginal` filter param, but then I needed to re-implement the
> > existing wildcard behaviour.
> > I tried using the edge-ngram filter, but found that when searching for
> the
> > phrase `bread stick` on a field containing the word `breadstick` and
> > `q.op=AND` it returns no results, as the content `breadstick` does not
> > _start with_ `stick`. The previous wildcard behaviour would return all
> > documents that contain the substrings `bread` AND `stick`, which is the
> > desired behaviour.
> > I tried using the ngram filter, but this does not support the
> > `preserveOriginal`, and so loses a lot of relevance for exact matches,
> but
> > it also results in matches that are far too broad, creating 21 tokens
> from
> > `breadstick` for `minGramSize=3` and `maxGramSize=5` that in practice
> > essentially matches all of the documents. Which means that boosts applied
> > to other fields, such as 'in stock', push irrelevant documents to the
> top.
> >
> > Finally, I tried to strip out ngrams entirely and use subquery/LocalParam
> > syntax and local params, a solr feature that is not very well documented.
> > I created something like `q={!edismax sow=true v=$widlcards} OR {!edismax
> > sow=false v=$plain}` to effectively create a union of results, one with
> > multi word synonyms support and one with wildcard support.
> > But then I had to implement the other edismax params and immediately
> > stumbled.
> > Each query in production normally has a slew of `bf` and `bq` params,
> and I
> > cannot see a way to pass these into the nested query using local
> variables.
> > If I have 3 different `bf` params how can I pass them into the local
> param
> > subqueries?
> >
> > Also, as the search in production is across multiple fields I found
> passing
> > `qf` to both subqueries using dereferencing failed, as the parser saw it
> as
> > a single field and threw a 'number format exception'.
> > i.e.
> > q={!edismax sow=true v=$tw tf=$tqf} OR {!edismax sow=false v=$tp tf=$tqf}
> > $tw=*bread* *stick*
> > $tp=bread stick
> > $tqf=title^2 desctiption^0.5
> >
> > As you can guess, I have spent quite some time going down this rabbit
> hole
> > in my attempt to reproduce the existing desired functionality alongside
> > multiterm synonyms.
> > Is there a way to get multiterm synonyms working with substring matching
> > effectively?
> > I am sure there is a much simpler way that I am missing than all of my
> > attempts so far.
> >
> > Solr: 8.3
> >
> > Thanks
> > Martin Graney
> >
> > --
> >  <https://www.linkedin.com/company/sooqr-com/>
> >
>


-- 
Martin Graney
Lead Developer

http://sooqr.com <http://www.sooqr.com/>
http://twitter.com/sooqrcom

Office: +31 (0) 88 766 7700
Mobile: +31 (0) 64 660 8543

-- 
 <https://www.linkedin.com/company/sooqr-com/>

Re: Multiword synonyms and term wildcards/substring matching

2021-03-02 Thread Alexandre Rafalovitch

I admit to not fully understanding the examples, but ComplexQueryParser
looks like something worth at least reviewing:

https://lucene.apache.org/solr/guide/8_8/other-parsers.html#complex-phrase-query-parser

Also I did not see any references to trying to copyField and process same
content in different ways. If copyField is not stored, the overhead is not
as large.

Regards,
Alex



On Tue., Mar. 2, 2021, 7:08 a.m. Martin Graney, 
wrote:

> Hi All
>
> I have been trying to implement multi word synonyms using `sow=false` into
> a pre-existing system that applied pre-processing to the phrase to apply
> wildcards around the terms, i.e. `bread stick` => `*bread* *stick*`.
>
> I got the synonyms expansion working perfectly, after discovering the
> `preserveOriginal` filter param, but then I needed to re-implement the
> existing wildcard behaviour.
> I tried using the edge-ngram filter, but found that when searching for the
> phrase `bread stick` on a field containing the word `breadstick` and
> `q.op=AND` it returns no results, as the content `breadstick` does not
> _start with_ `stick`. The previous wildcard behaviour would return all
> documents that contain the substrings `bread` AND `stick`, which is the
> desired behaviour.
> I tried using the ngram filter, but this does not support the
> `preserveOriginal`, and so loses a lot of relevance for exact matches, but
> it also results in matches that are far too broad, creating 21 tokens from
> `breadstick` for `minGramSize=3` and `maxGramSize=5` that in practice
> essentially matches all of the documents. Which means that boosts applied
> to other fields, such as 'in stock', push irrelevant documents to the top.
>
> Finally, I tried to strip out ngrams entirely and use subquery/LocalParam
> syntax and local params, a solr feature that is not very well documented.
> I created something like `q={!edismax sow=true v=$widlcards} OR {!edismax
> sow=false v=$plain}` to effectively create a union of results, one with
> multi word synonyms support and one with wildcard support.
> But then I had to implement the other edismax params and immediately
> stumbled.
> Each query in production normally has a slew of `bf` and `bq` params, and I
> cannot see a way to pass these into the nested query using local variables.
> If I have 3 different `bf` params how can I pass them into the local param
> subqueries?
>
> Also, as the search in production is across multiple fields I found passing
> `qf` to both subqueries using dereferencing failed, as the parser saw it as
> a single field and threw a 'number format exception'.
> i.e.
> q={!edismax sow=true v=$tw tf=$tqf} OR {!edismax sow=false v=$tp tf=$tqf}
> $tw=*bread* *stick*
> $tp=bread stick
> $tqf=title^2 desctiption^0.5
>
> As you can guess, I have spent quite some time going down this rabbit hole
> in my attempt to reproduce the existing desired functionality alongside
> multiterm synonyms.
> Is there a way to get multiterm synonyms working with substring matching
> effectively?
> I am sure there is a much simpler way that I am missing than all of my
> attempts so far.
>
> Solr: 8.3
>
> Thanks
> Martin Graney
>
> --
>  <https://www.linkedin.com/company/sooqr-com/>
>

Multiword synonyms and term wildcards/substring matching

2021-03-02 Thread Martin Graney

Hi All

I have been trying to implement multi word synonyms using `sow=false` into
a pre-existing system that applied pre-processing to the phrase to apply
wildcards around the terms, i.e. `bread stick` => `*bread* *stick*`.

I got the synonyms expansion working perfectly, after discovering the
`preserveOriginal` filter param, but then I needed to re-implement the
existing wildcard behaviour.
I tried using the edge-ngram filter, but found that when searching for the
phrase `bread stick` on a field containing the word `breadstick` and
`q.op=AND` it returns no results, as the content `breadstick` does not
_start with_ `stick`. The previous wildcard behaviour would return all
documents that contain the substrings `bread` AND `stick`, which is the
desired behaviour.
I tried using the ngram filter, but this does not support the
`preserveOriginal`, and so loses a lot of relevance for exact matches, but
it also results in matches that are far too broad, creating 21 tokens from
`breadstick` for `minGramSize=3` and `maxGramSize=5` that in practice
essentially matches all of the documents. Which means that boosts applied
to other fields, such as 'in stock', push irrelevant documents to the top.

Finally, I tried to strip out ngrams entirely and use subquery/LocalParam
syntax and local params, a solr feature that is not very well documented.
I created something like `q={!edismax sow=true v=$widlcards} OR {!edismax
sow=false v=$plain}` to effectively create a union of results, one with
multi word synonyms support and one with wildcard support.
But then I had to implement the other edismax params and immediately
stumbled.
Each query in production normally has a slew of `bf` and `bq` params, and I
cannot see a way to pass these into the nested query using local variables.
If I have 3 different `bf` params how can I pass them into the local param
subqueries?

Also, as the search in production is across multiple fields I found passing
`qf` to both subqueries using dereferencing failed, as the parser saw it as
a single field and threw a 'number format exception'.
i.e.
q={!edismax sow=true v=$tw tf=$tqf} OR {!edismax sow=false v=$tp tf=$tqf}
$tw=*bread* *stick*
$tp=bread stick
$tqf=title^2 desctiption^0.5

As you can guess, I have spent quite some time going down this rabbit hole
in my attempt to reproduce the existing desired functionality alongside
multiterm synonyms.
Is there a way to get multiterm synonyms working with substring matching
effectively?
I am sure there is a much simpler way that I am missing than all of my
attempts so far.

Solr: 8.3

Thanks
Martin Graney

-- 
 <https://www.linkedin.com/company/sooqr-com/>

Re: SOLR 8.6 Synonyms search and out of context results

2021-01-22 Thread Colvin Cowie

Hello,

Do you mean that you want searches for "gain" to match documents with
"revenue" on them, but do *not* want searches for "revenue" to match
documents with "gain" on them?

If that's what you mean, how have you defined your synonyms? If you're
using the SynonymGraphFilterFactory
https://lucene.apache.org/core/8_6_0/analyzers-common/org/apache/lucene/analysis/synonym/SynonymGraphFilterFactory.html
then the default parser is
https://lucene.apache.org/core/8_6_0/analyzers-common/org/apache/lucene/analysis/synonym/SolrSynonymParser.html
which by default will treat comma separated entries as equivalent
(bi-directional), while an explicit mapping (=>) only goes in one direction.

e.g. given "revenue,gain", revenue is a synonym of gain and gain is a
synonym of revenue. However given "gain => revenue", revenue will be a
synonym of gain (If you search for gain it will match revenue, but revenue
won't turn into gain).
When using the synonymGraphFilter at query time I believe (though this may
be wrong) that for directional mappings I needed to include the term from
the left hand side on the right hand side as well in order for it to still
match the original term.
So if I've understood your question, I would define that as "gain =>
gain,revenue".

If that doesn't solve it, feel free to share your config and someone might
be able to make a suggestion

On Fri, 22 Jan 2021 at 14:11, Iram Tariq 
wrote:

> Hi All,
>
> Using SOLR default Synonyms search I am able to search Synonyms but for
> some cases it is giving ambiguous results.
>
> For example one of Synonyms of "Revenue" is "Gain"
> Input Keyword for search: Revenue and Company
> Irrelevant Output: Our company doesn't want to gain success through
> shortcuts.
> Solr version I am using: 8.6.3
>
> Any help is very much appreciated here.
>
> Regards,
>
>
> Iram Tariq | Associate Architect
>
> NorthBay
>
> Direct:  +92-333-3636333
>
> iram.ta...@northbaysolutions.net
>
> www.northbaysolutions.com
>

SOLR 8.6 Synonyms search and out of context results

2021-01-22 Thread Iram Tariq

Hi All,

Using SOLR default Synonyms search I am able to search Synonyms but for
some cases it is giving ambiguous results.

For example one of Synonyms of "Revenue" is "Gain"
Input Keyword for search: Revenue and Company
Irrelevant Output: Our company doesn't want to gain success through
shortcuts.
Solr version I am using: 8.6.3

Any help is very much appreciated here.

Regards,


Iram Tariq | Associate Architect

NorthBay

Direct:  +92-333-3636333

iram.ta...@northbaysolutions.net

www.northbaysolutions.com

Re: Multi-word Synonyms not working properly with Edismax

2020-09-08 Thread Manish Bafna

Yes, we tried that and it worked. We removed only for query analyzer and it
is working properly now.


On Wed, Sep 9, 2020 at 2:24 AM Dominique Bejean 
wrote:

> Hi,
>
> Can you try to remove the RemoveDuplicatesTokenFilter ?
>
> Dominique
>
> Le mar. 8 sept. 2020 à 13:52, Manish Bafna  a
> écrit :
>
> > Hi,
> >
> > We are using the following configuration:
> >
> >
> >
> > --
> >
> > *Schema: *
> >
> >  >
> > positionIncrementGap="100"  autoGeneratePhraseQueries="true"
> >
> > omitNorms="true">
> >
> >  
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> >  >
> > dictionary="../hunspell_dictionary/en_US.dic"
> >
> > affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" />
> >
> >  >
> > 
> >
> >  
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> >  >
> > dictionary="../hunspell_dictionary/en_US.dic"
> >
> > affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" />
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > *Managed Synonyms:* "abc implement",  "bike", "xyz traders", "xyz
> > transport"
> >
> > -
> >
> > *Query*: bike
> >
> > *parser Type:* edismax
> >
> > -
> >
> > *Parsed query (from debug)* : +DisjunctionMaxQueryfield1:"abc
> >
> > implement" field1:bike field1:"xyz traders" field1:"xyz trade"))
> >
> > -
> >
> >
> >
> > If you notice, there are 2 multi-word keywords starting with xyz, but
> only
> >
> > 1 of them is getting added to the query. If we change xyz transport to xy
> >
> > transport, then it works properly. The issue is only when the 2
> multi-word
> >
> > keywords start with the same word. Though we are using graph synonyms, it
> >
> > is not working properly.
> >
> >
> >
> > Are we doing anything wrong here?
> >
> >
> >
> > Thanks,
> >
> > Manish.
> >
> >
>

Re: Multi-word Synonyms not working properly with Edismax

2020-09-08 Thread Dominique Bejean

Hi,

Can you try to remove the RemoveDuplicatesTokenFilter ?

Dominique

Le mar. 8 sept. 2020 à 13:52, Manish Bafna  a
écrit :

> Hi,
>
> We are using the following configuration:
>
>
>
> --
>
> *Schema: *
>
> 
> positionIncrementGap="100"  autoGeneratePhraseQueries="true"
>
> omitNorms="true">
>
>  
>
> 
>
> 
>
> 
>
> 
>
> 
> dictionary="../hunspell_dictionary/en_US.dic"
>
> affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" />
>
> 
> 
>
>  
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
> dictionary="../hunspell_dictionary/en_US.dic"
>
> affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" />
>
> 
>
> 
>
> 
>
> 
>
> *Managed Synonyms:* "abc implement",  "bike", "xyz traders", "xyz
> transport"
>
> -
>
> *Query*: bike
>
> *parser Type:* edismax
>
> -
>
> *Parsed query (from debug)* : +DisjunctionMaxQueryfield1:"abc
>
> implement" field1:bike field1:"xyz traders" field1:"xyz trade"))
>
> -
>
>
>
> If you notice, there are 2 multi-word keywords starting with xyz, but only
>
> 1 of them is getting added to the query. If we change xyz transport to xy
>
> transport, then it works properly. The issue is only when the 2 multi-word
>
> keywords start with the same word. Though we are using graph synonyms, it
>
> is not working properly.
>
>
>
> Are we doing anything wrong here?
>
>
>
> Thanks,
>
> Manish.
>
>

Multi-word Synonyms not working properly with Edismax

2020-09-08 Thread Manish Bafna

Hi,
We are using the following configuration:

--
*Schema: *

 






 










*Managed Synonyms:* "abc implement",  "bike", "xyz traders", "xyz transport"
-
*Query*: bike
*parser Type:* edismax
-
*Parsed query (from debug)* : +DisjunctionMaxQueryfield1:"abc
implement" field1:bike field1:"xyz traders" field1:"xyz trade"))
-

If you notice, there are 2 multi-word keywords starting with xyz, but only
1 of them is getting added to the query. If we change xyz transport to xy
transport, then it works properly. The issue is only when the 2 multi-word
keywords start with the same word. Though we are using graph synonyms, it
is not working properly.

Are we doing anything wrong here?

Thanks,
Manish.

Multi-synonyms with sow=false, and Minimum match

2020-07-26 Thread Amrit Sarkar

Hi! hope everyone is well.

I was looking at some old articles and pondered upon
https://opensourceconnections.com/blog/2018/02/20/edismax-and-multiterm-synonyms-oddities/
.

Do we have a standard manner / robust solution to handle fields with
different analyzers (multi-word synonym etc.) clubbed together,
with sow=false? Or the recommendation by Doug T. still holds?

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

Re: Tokenizing managed synonyms

2020-07-08 Thread Kayak28

Hello, Solr Community:

Actually, you can set up a tokenizer for the managed synonyms.
But, the configuration is not on the reference guide, and I do not know how
to add a Tokenizer via API-call.
So, you might need to manually edit a JSON file below the config directory.


In the _schema_analysis_synonyms_.json under config
directory, you will see the JSON below.

{
  "responseHeader":{
"status":0,
"QTime":3},
  "synonymMappings":{
"initArgs":{
  "ignoreCase":true,
  "format":"solr"},
"initializedOn":"2014-12-16T22:44:05.33Z",
"managedMap":{
  "GB":
["GiB",
 "Gigabyte"],
  "TV":
["Television"],
  "happy":
["glad",
 "joyful"]}}}


In order to add a tokenizer, under the "initArgs" key, you need to add the
following key-value data.
 "tokenizerFactory":"solr.Factory"

Eventually,  you will get the following JSON.
{ "responseHeader":{
  "status":0, "QTime":3},
  "synonymMappings":{ "
  initArgs":{
  "ignoreCase":true,
  "format":"solr",
  "tokenizerFactory":"solr.Factory"
   },
  "initializedOn":"2014-12-16T22:44:05.33Z",
 "managedMap":{
 "GB": ["GiB", "Gigabyte"],
 "TV": ["Television"],
 "happy": ["glad", "joyful"]}}}


I would like to add this configuration to Solr reference guide, but I have
not created a JIRA issue yet.


-- 

Sincerely,
Kaya
github: https://github.com/28kayak



2020年7月7日(火) 11:55 Koji Sekiguchi :

> I think the question makes sense as SynonymGraphFilterFactory accepts
> tokenizerFactory,
> he asked the managed version of SynonymGraphFilter could accept it as well.
>
>
> https://lucene.apache.org/solr/guide/8_5/filter-descriptions.html#synonym-graph-filter
>
> The answer seems to be NO.
>
> Koji
>
>
> On 2020/07/07 8:18, Erick Erickson wrote:
> > This question doesn’t really make sense. You don’t specify tokenizers on
> > filters, they’re specified at the _field_ level.
> >
> > You can certainly define as many field(type)s as you want, each with a
> different
> > analysis chain and those chains can be made up of whatever you want to
> use, and
> > there are lots of choices.
> >
> > If you are asking to do _additional_ tokenization on the output of a
> synonym
> > filter, no.
> >
> > Perhaps if you defined the problem you’re trying to solve we could make
> some
> > suggestions.
> >
> > Best,
> > Erick
> >
> >> On Jul 6, 2020, at 6:43 PM, Thomas Corthals 
> wrote:
> >>
> >> Hi,
> >>
> >> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
> >> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
> >> some fields.
> >>
> >> Best,
> >>
> >> Thomas
> >
> >
>


<https://github.com/28kayak>

Re: Tokenizing managed synonyms

2020-07-06 Thread Koji Sekiguchi


I think the question makes sense as SynonymGraphFilterFactory accepts 
tokenizerFactory,
he asked the managed version of SynonymGraphFilter could accept it as well.

https://lucene.apache.org/solr/guide/8_5/filter-descriptions.html#synonym-graph-filter

The answer seems to be NO.

Koji


On 2020/07/07 8:18, Erick Erickson wrote:

This question doesn’t really make sense. You don’t specify tokenizers on
filters, they’re specified at the _field_ level.

You can certainly define as many field(type)s as you want, each with a different
analysis chain and those chains can be made up of whatever you want to use, and
there are lots of choices.

If you are asking to do _additional_ tokenization on the output of a synonym
filter, no.

Perhaps if you defined the problem you’re trying to solve we could make some
suggestions.

Best,
Erick


On Jul 6, 2020, at 6:43 PM, Thomas Corthals  wrote:

Hi,

Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
some fields.

Best,

Thomas

Re: Tokenizing managed synonyms

2020-07-06 Thread Erick Erickson

This question doesn’t really make sense. You don’t specify tokenizers on
filters, they’re specified at the _field_ level.

You can certainly define as many field(type)s as you want, each with a different
analysis chain and those chains can be made up of whatever you want to use, and
there are lots of choices.

If you are asking to do _additional_ tokenization on the output of a synonym
filter, no.

Perhaps if you defined the problem you’re trying to solve we could make some
suggestions.

Best,
Erick

> On Jul 6, 2020, at 6:43 PM, Thomas Corthals  wrote:
> 
> Hi,
> 
> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
> some fields.
> 
> Best,
> 
> Thomas

Re: Tokenizing managed synonyms

2020-07-06 Thread Erick Erickson

Please don’t hijack threads, start a new one when you switch topics.

> On Jul 6, 2020, at 6:52 PM, Stavros Macrakis  wrote:
> 
> How can I search for a term *except *when it's part of certain phrases?
> 
> For example, I might want to find documents mentioning "pepper" where it is
> not part of the phrases "chili pepper", "hot pepper", or "pepper sauce".
> 
> It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper"
> OR "pepper sauce")] because that excludes all documents which mention
> "chili pepper" even if they *also* mention "black pepper" or the unmodified
> word "pepper". Maybe some way using synonyms?
> 
> Thanks!
> 
> -s
> 
> On Mon, Jul 6, 2020 at 6:43 PM Thomas Corthals 
> wrote:
> 
>> Hi,
>> 
>> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
>> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
>> some fields.
>> 
>> Best,
>> 
>> Thomas
>>

Re: Tokenizing managed synonyms

2020-07-06 Thread Stavros Macrakis

How can I search for a term *except *when it's part of certain phrases?

For example, I might want to find documents mentioning "pepper" where it is
not part of the phrases "chili pepper", "hot pepper", or "pepper sauce".

It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper"
OR "pepper sauce")] because that excludes all documents which mention
"chili pepper" even if they *also* mention "black pepper" or the unmodified
word "pepper". Maybe some way using synonyms?

Thanks!

 -s

On Mon, Jul 6, 2020 at 6:43 PM Thomas Corthals 
wrote:

> Hi,
>
> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
> some fields.
>
> Best,
>
> Thomas
>

Tokenizing managed synonyms

2020-07-06 Thread Thomas Corthals

Hi,

Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
some fields.

Best,

Thomas

Re: Weird issues when using synonyms and stopwords together

2020-03-20 Thread Walter Underwood

Do not remove stopwords.

Stopword removal was a hack invented for 16-bit machines and multi-megabyte 
disks.
That hack is not needed now.

tf.idf addresses the same problem as stopwords with a much better algorithm.
Removing stopwords is an on/off decision for a guess at common words.
tf.idf is a proportional weighting of common words based on the statistics of
your documents.

Do not remove stopwords.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 20, 2020, at 7:52 AM, Vikas Kumar  wrote:
> 
> I have a field title in my solr schema:
> 
>  required="true" stored="true" />
> 
> text_en is defined as follows:
> 
> positionIncrementGap="100" docValues="false" multiValued="false">
>
>
> words="stopwords_en.txt" />
>
> preserveOriginal="true" />
>
>
>
>
> synonyms="synonyms_en.txt" ignoreCase="true" expand="true" />
> words="stopwords_en.txt" />
>
>
>
>
> 
> I'm encountering strange behaviour when using multi-word synonyms which
> contain stopwords.
> 
> If the stopwords appear in the middle, it works fine. For example, if I
> have the following in my synonyms file (where i is a stopword):
> 
> iphone, apple i phone
> 
> And if I query: /select?q=iphone=title=edismax
> 
> The parsed query is: +DisjunctionMaxQuery(+title:appl +title:phone)
> title:iphon
> 
> Same for query: /select?q=apple i phone=title=edismax
> 
> But if stopwords appear at the start or end, then behaviour is
> unpredictable.
> 
> In most of the cases, the entire synonym is dropped. For example, if I
> change my synonyms file to:
> 
> iphone, i phone
> 
> and do the same query again (with iphone), I get:
> 
> +DisjunctionMaxQuery(((title:iphon)))
> 
> I was expecting iphon and phone (as i would be dropped) in my dismax query.
> 
> In some cases, behaviour is even more weird.
> 
> For example, if my synonyms file is:
> 
> between two ferns,netflix comedy,zach galifianakis show,netflix 2019 best
> 
> and I have ferns and best as my stopwords. If I do the following query:
> 
> /select?q=netflix comedy=title=edismax
> 
> I get this:
> 
> +DisjunctionMaxQuery+title:between +title:two +title:galifianaki
> +title:show) (+title:netflix +title:2019 +title:comedi
> 
> which is kind of a very weird combinations.
> 
> I'm not able to understand this behaviour and have not found anything
> related to this in documentation or internet. Maybe I'm missing something.
> Any help/pointers is highly appreciated.
> 
> Solr version: 8.4.1

Weird issues when using synonyms and stopwords together

2020-03-20 Thread Vikas Kumar

I have a field title in my solr schema:



text_en is defined as follows:


















I'm encountering strange behaviour when using multi-word synonyms which
contain stopwords.

If the stopwords appear in the middle, it works fine. For example, if I
have the following in my synonyms file (where i is a stopword):

iphone, apple i phone

And if I query: /select?q=iphone=title=edismax

The parsed query is: +DisjunctionMaxQuery(+title:appl +title:phone)
title:iphon

Same for query: /select?q=apple i phone=title=edismax

But if stopwords appear at the start or end, then behaviour is
unpredictable.

In most of the cases, the entire synonym is dropped. For example, if I
change my synonyms file to:

iphone, i phone

and do the same query again (with iphone), I get:

+DisjunctionMaxQuery(((title:iphon)))

I was expecting iphon and phone (as i would be dropped) in my dismax query.

In some cases, behaviour is even more weird.

For example, if my synonyms file is:

between two ferns,netflix comedy,zach galifianakis show,netflix 2019 best

and I have ferns and best as my stopwords. If I do the following query:

/select?q=netflix comedy=title=edismax

I get this:

+DisjunctionMaxQuery+title:between +title:two +title:galifianaki
+title:show) (+title:netflix +title:2019 +title:comedi

which is kind of a very weird combinations.

I'm not able to understand this behaviour and have not found anything
related to this in documentation or internet. Maybe I'm missing something.
Any help/pointers is highly appreciated.

Solr version: 8.4.1

Re: Re: Re: Re: Handling overlapping synonyms

2020-01-20 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com

Hm, I'm not sure what you mean, but I am pretty new to Solr. Apologies!

On 1/20/20, 12:01 PM, "fiedzia"  wrote:

>From my understanding, if you want regional sales manager to be indexed as
both director of sales and area manager, you  
>would have to type:
>
>Regional sales manager -> director of sales, area manager

that works for searching, but because everything is in the same position, 
searching for "director of sales" highlights whole "regional sales manager".

while it should be indexed as: (numbers inidicate token positions

1   2   3
regional sales manager

1
area manager
 2 director of sales


I guess I'll need to override SynonymGraphFilter to achieve that



--
Sent from: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html=DwICAg=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=tDOfGxVxBgFG1YZDv8WICuXs07jdb2IIpoJ0j3Fu7nc=yT0_rHgmEbHTvjxL9Vw9TN3d0TeqHg6avTkuseDWDw8=

Re: Re: Re: Handling overlapping synonyms

2020-01-20 Thread fiedzia

>From my understanding, if you want regional sales manager to be indexed as
both director of sales and area manager, you  
>would have to type:
>
>Regional sales manager -> director of sales, area manager

that works for searching, but because everything is in the same position, 
searching for "director of sales" highlights whole "regional sales manager".

while it should be indexed as: (numbers inidicate token positions

1   2   3
regional sales manager

1
area manager
 2 director of sales


I guess I'll need to override SynonymGraphFilter to achieve that



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Re: Re: Handling overlapping synonyms

2020-01-20 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com

From my understanding, if you want regional sales manager to be indexed as both
director of sales and area manager, you would have to type:

Regional sales manager -> director of sales, area manager

I do not believe you can chain synonyms.

Re: bigrams/trigrams, I was more interested in you wanting to manually create
them by inserting a "_" between the tokens. There is a bigram / trigram
capability OOTB with Solr, so is there a reason you're manually coding these
into your index instead of just using the OOTB function?

On 1/20/20, 6:58 AM, "fiedzia" wrote:

> what is the reasoning behind adding the bigrams and trigrams manually like
that? Maybe if we knew the end goal, we could figure out a different
strategy. Happy that at least the matching is working now!

I have large amount of synonyms and keep adding new ones, some of them
partially overlap. Its the nature of a language that adding keywords to a
phrase creates distinctive meaning. Another example:

sales manager -> director of sales
regional sales manager -> area manager

I'd expect "regional sales manager" to be indexed as both.

regional sales manager
^^ -> director of sales
^^ -> area manager

so that searching for any of those terms matches and highlights relevant
part.
However when SynonymGraphFilter finds one synonym it will ignore the other.

--
Sent from:
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html=DwICAg=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=JUEk2QAGcPS4Pi_y6d3EWDmtYMVjg2Sg-4ZwC-90VqE=tgepeqV5fWmuUgtTc767hv_1czuJnhM9O9LmWVgpDdM=

Re: Re: Handling overlapping synonyms

2020-01-20 Thread fiedzia

> what is the reasoning behind adding the bigrams and trigrams manually like
that? Maybe if we knew the end goal, we could figure out a different
strategy. Happy that at least the matching is working now! 

I have large amount of synonyms and keep adding new ones, some of them
partially overlap. Its the nature of a language that adding keywords to a
phrase creates distinctive meaning. Another example:


sales manager -> director of sales
regional sales manager -> area manager

I'd expect "regional sales manager" to be indexed as both.

regional sales manager
^^ -> director of sales
^^ -> area manager

so that searching for any of those terms matches and highlights relevant
part.
However when SynonymGraphFilter finds one synonym it will ignore the other.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Re: Handling overlapping synonyms

2020-01-17 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com

Hmm  what is the reasoning behind adding the bigrams and trigrams manually 
like that? Maybe if we knew the end goal, we could figure out a different 
strategy. Happy that at least the matching is working now!

On 1/17/20, 10:28 AM, "fiedzia"  wrote:

> Doing it the other way (new york city -> new_york_city, new_york) makes
more
sense,

Just checked it, that way does the matching as expected, but highlighting is
wrong
("new york: query matches "new york city" as it should, but also highlights
all of it)



--
Sent from: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html=DwICAg=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=sxUM_HkySPw_KqJdqMGkjWQyUQ6W7K44Nid7p7wcBJ4=rJFkuEpTxkPp6EtyRstEE3PWCY-CSAmtjOFJ9ge67uU=

Re: Handling overlapping synonyms

2020-01-17 Thread fiedzia

> Doing it the other way (new york city -> new_york_city, new_york) makes
more
sense,

Just checked it, that way does the matching as expected, but highlighting is
wrong
("new york: query matches "new york city" as it should, but also highlights
all of it)



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Handling overlapping synonyms

2020-01-17 Thread fiedzia

> If you instead write "new york => new_york, new_york_city" it should work

I can't do that, as that would turn "new york" into "new york_city", which
is not what I want.
Doing it the other way (new york city -> new_york_city, new_york) makes more
sense, though I expect this to get positions wrong and mess with
highlighting.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Handling overlapping synonyms

2020-01-17 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com

If you instead write "new york => new_york, new_york_city" it should work 
(https://doc.lucidworks.com/fusion/3.1/Collections/Synonyms-Files.html)

On 1/17/20, 6:29 AM, "fiedzia"  wrote:

Having synonyms defined for

new york  -> new_york
new york city -> new_york_city

I'd like the phrase
new york city
to be indexed as both, but SynonymGraphFilter picks only one. Is there a way
around that?

-- 
Maciej Dziardziel
fied...@gmail.com



--
Sent from: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html=DwICAg=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=ogoT0t33fiW87_QMoUn_sWWs_DWHiunR_gq1iXkMR8I=3mtCduryNf-zp79DbcKRtn2hSOWWtgbmYX4idUg1VB0=

Handling overlapping synonyms

2020-01-17 Thread fiedzia

Having synonyms defined for

new york  -> new_york
new york city -> new_york_city

I'd like the phrase
new york city
to be indexed as both, but SynonymGraphFilter picks only one. Is there a way
around that?

-- 
Maciej Dziardziel
fied...@gmail.com



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Query-time synonyms without indexing

2019-08-29 Thread Erick Erickson

Ah, thanks for letting us know. 

Erick

> On Aug 29, 2019, at 9:20 AM, Bjarke Buur Mortensen  
> wrote:
> 
> The  section without type is the one getting picked up for the
> index-time chain, so that wasn't my problem.
> 
> It turns out that because of
> https://issues.apache.org/jira/browse/LUCENE-8134, I needed to add
> a omitTermFreqAndPositions="true" to the  declaration.
> This has to do with defaults for a string field being different from a text
> field, and i Solr 8+ indexing fails because of above ticket.
> Adding omitTermFreqAndPositions="true" ensures that index field type and
> the schema field type agree on the settings, as I understand it.
> 
> Regards,
> Bjarke
> 
> 
> 
> Den ons. 28. aug. 2019 kl. 13.26 skrev Erick Erickson <
> erickerick...@gmail.com>:
> 
>> Not sure. You have an
>> 
>> section and
>> 
>> 
>> section. Frankly I’m not sure which one will be used for the index-time
>> chain.
>> 
>> Why don’t you just try it?
>> change
>> 
>> to
>> 
>> 
>> reload and go. It’d take you 5 minutes and you’d have your answer.
>> 
>> Best,
>> Erick
>> 
>> 
>>> On Aug 28, 2019, at 1:57 AM, Bjarke Buur Mortensen <
>> morten...@eluence.com> wrote:
>>> 
>>> Yes, but isn't that what I am already doing in this case (look at the
>>> fieldType in the original mail)?
>>> Is there some other way to specify that field type and achieve what I
>> want?
>>> 
>>> Thanks,
>>> Bjarke
>>> 
>>> On Tue, Aug 27, 2019, 17:32 Erick Erickson 
>> wrote:
>>> 
>>>> You can have separate index and query time analysis chains, there are
>> many
>>>> examples in the stock Solr schemas.
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>>> On Aug 27, 2019, at 8:48 AM, Bjarke Buur Mortensen <
>>>> morten...@eluence.com> wrote:
>>>>> 
>>>>> We have a solr file of type "string".
>>>>> It turns out that we need to do synonym expansion on query time in
>> order
>>>> to
>>>>> account for some changes over time in the values stored in that field.
>>>>> 
>>>>> So we have tried introducing a custom fieldType that applies the
>> synonym
>>>>> filter at query time only (see bottom of mail), but that requires us to
>>>>> change the field. But now, when we index new documents, Solr complains:
>>>>> 400 Bad Request
>>>>> Error: 'Exception writing document id someid to the index; possible
>>>>> analysis error: cannot change field "auth_country_code" from index
>>>>> options=DOCS to inconsistent index
>> options=DOCS_AND_FREQS_AND_POSITIONS',
>>>>> 
>>>>> Since we are only making query time changes, I would really like to not
>>>>> have to reindex our entire collection. Is that possible somehow?
>>>>> 
>>>>> Thanks,
>>>>> Bjarke
>>>>> 
>>>>> 
>>>>> >>>> sortMissingLast="true" positionIncrementGap="100">
>>>>>  
>>>>>
>>>>>  
>>>>>  
>>>>>  
>>>>>  >>>> synonyms="country-synonyms.txt" ignoreCase="false" expand="true"/>
>>>>>  
>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Query-time synonyms without indexing

2019-08-29 Thread Bjarke Buur Mortensen

The  section without type is the one getting picked up for the
index-time chain, so that wasn't my problem.

It turns out that because of
https://issues.apache.org/jira/browse/LUCENE-8134, I needed to add
a omitTermFreqAndPositions="true" to the  declaration.
This has to do with defaults for a string field being different from a text
field, and i Solr 8+ indexing fails because of above ticket.
Adding omitTermFreqAndPositions="true" ensures that index field type and
the schema field type agree on the settings, as I understand it.

Regards,
Bjarke



Den ons. 28. aug. 2019 kl. 13.26 skrev Erick Erickson <
erickerick...@gmail.com>:

> Not sure. You have an
> 
> section and
> 
>
> section. Frankly I’m not sure which one will be used for the index-time
> chain.
>
> Why don’t you just try it?
> change
> 
> to
> 
>
> reload and go. It’d take you 5 minutes and you’d have your answer.
>
> Best,
> Erick
>
>
> > On Aug 28, 2019, at 1:57 AM, Bjarke Buur Mortensen <
> morten...@eluence.com> wrote:
> >
> > Yes, but isn't that what I am already doing in this case (look at the
> > fieldType in the original mail)?
> > Is there some other way to specify that field type and achieve what I
> want?
> >
> > Thanks,
> > Bjarke
> >
> > On Tue, Aug 27, 2019, 17:32 Erick Erickson 
> wrote:
> >
> >> You can have separate index and query time analysis chains, there are
> many
> >> examples in the stock Solr schemas.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Aug 27, 2019, at 8:48 AM, Bjarke Buur Mortensen <
> >> morten...@eluence.com> wrote:
> >>>
> >>> We have a solr file of type "string".
> >>> It turns out that we need to do synonym expansion on query time in
> order
> >> to
> >>> account for some changes over time in the values stored in that field.
> >>>
> >>> So we have tried introducing a custom fieldType that applies the
> synonym
> >>> filter at query time only (see bottom of mail), but that requires us to
> >>> change the field. But now, when we index new documents, Solr complains:
> >>> 400 Bad Request
> >>> Error: 'Exception writing document id someid to the index; possible
> >>> analysis error: cannot change field "auth_country_code" from index
> >>> options=DOCS to inconsistent index
> options=DOCS_AND_FREQS_AND_POSITIONS',
> >>>
> >>> Since we are only making query time changes, I would really like to not
> >>> have to reindex our entire collection. Is that possible somehow?
> >>>
> >>> Thanks,
> >>> Bjarke
> >>>
> >>>
> >>>  >>> sortMissingLast="true" positionIncrementGap="100">
> >>>   
> >>> 
> >>>   
> >>>   
> >>>   
> >>>>>> synonyms="country-synonyms.txt" ignoreCase="false" expand="true"/>
> >>>   
> >>> 
> >>
> >>
>
>

Re: Query-time synonyms without indexing

2019-08-28 Thread Erick Erickson

Not sure. You have an

section and 


section. Frankly I’m not sure which one will be used for the index-time chain.

Why don’t you just try it?
change

to 


reload and go. It’d take you 5 minutes and you’d have your answer.

Best,
Erick


> On Aug 28, 2019, at 1:57 AM, Bjarke Buur Mortensen  
> wrote:
> 
> Yes, but isn't that what I am already doing in this case (look at the
> fieldType in the original mail)?
> Is there some other way to specify that field type and achieve what I want?
> 
> Thanks,
> Bjarke
> 
> On Tue, Aug 27, 2019, 17:32 Erick Erickson  wrote:
> 
>> You can have separate index and query time analysis chains, there are many
>> examples in the stock Solr schemas.
>> 
>> Best,
>> Erick
>> 
>>> On Aug 27, 2019, at 8:48 AM, Bjarke Buur Mortensen <
>> morten...@eluence.com> wrote:
>>> 
>>> We have a solr file of type "string".
>>> It turns out that we need to do synonym expansion on query time in order
>> to
>>> account for some changes over time in the values stored in that field.
>>> 
>>> So we have tried introducing a custom fieldType that applies the synonym
>>> filter at query time only (see bottom of mail), but that requires us to
>>> change the field. But now, when we index new documents, Solr complains:
>>> 400 Bad Request
>>> Error: 'Exception writing document id someid to the index; possible
>>> analysis error: cannot change field "auth_country_code" from index
>>> options=DOCS to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS',
>>> 
>>> Since we are only making query time changes, I would really like to not
>>> have to reindex our entire collection. Is that possible somehow?
>>> 
>>> Thanks,
>>> Bjarke
>>> 
>>> 
>>> >> sortMissingLast="true" positionIncrementGap="100">
>>>   
>>> 
>>>   
>>>   
>>>   
>>>   >> synonyms="country-synonyms.txt" ignoreCase="false" expand="true"/>
>>>   
>>> 
>> 
>>

Re: Query-time synonyms without indexing

2019-08-27 Thread Bjarke Buur Mortensen

Yes, but isn't that what I am already doing in this case (look at the
fieldType in the original mail)?
Is there some other way to specify that field type and achieve what I want?

Thanks,
Bjarke

On Tue, Aug 27, 2019, 17:32 Erick Erickson  wrote:

> You can have separate index and query time analysis chains, there are many
> examples in the stock Solr schemas.
>
> Best,
> Erick
>
> > On Aug 27, 2019, at 8:48 AM, Bjarke Buur Mortensen <
> morten...@eluence.com> wrote:
> >
> > We have a solr file of type "string".
> > It turns out that we need to do synonym expansion on query time in order
> to
> > account for some changes over time in the values stored in that field.
> >
> > So we have tried introducing a custom fieldType that applies the synonym
> > filter at query time only (see bottom of mail), but that requires us to
> > change the field. But now, when we index new documents, Solr complains:
> > 400 Bad Request
> > Error: 'Exception writing document id someid to the index; possible
> > analysis error: cannot change field "auth_country_code" from index
> > options=DOCS to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS',
> >
> > Since we are only making query time changes, I would really like to not
> > have to reindex our entire collection. Is that possible somehow?
> >
> > Thanks,
> > Bjarke
> >
> >
> >   > sortMissingLast="true" positionIncrementGap="100">
> >
> >  
> >
> >
> >
> > > synonyms="country-synonyms.txt" ignoreCase="false" expand="true"/>
> >
> >  
>
>

Re: Query-time synonyms without indexing

2019-08-27 Thread Erick Erickson

You can have separate index and query time analysis chains, there are many 
examples in the stock Solr schemas.

Best,
Erick

> On Aug 27, 2019, at 8:48 AM, Bjarke Buur Mortensen  
> wrote:
> 
> We have a solr file of type "string".
> It turns out that we need to do synonym expansion on query time in order to
> account for some changes over time in the values stored in that field.
> 
> So we have tried introducing a custom fieldType that applies the synonym
> filter at query time only (see bottom of mail), but that requires us to
> change the field. But now, when we index new documents, Solr complains:
> 400 Bad Request
> Error: 'Exception writing document id someid to the index; possible
> analysis error: cannot change field "auth_country_code" from index
> options=DOCS to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS',
> 
> Since we are only making query time changes, I would really like to not
> have to reindex our entire collection. Is that possible somehow?
> 
> Thanks,
> Bjarke
> 
> 
>   sortMissingLast="true" positionIncrementGap="100">
>
>  
>
>
>
> synonyms="country-synonyms.txt" ignoreCase="false" expand="true"/>
>
>

Query-time synonyms without indexing

2019-08-27 Thread Bjarke Buur Mortensen

We have a solr file of type "string".
It turns out that we need to do synonym expansion on query time in order to
account for some changes over time in the values stored in that field.

So we have tried introducing a custom fieldType that applies the synonym
filter at query time only (see bottom of mail), but that requires us to
change the field. But now, when we index new documents, Solr complains:
400 Bad Request
Error: 'Exception writing document id someid to the index; possible
analysis error: cannot change field "auth_country_code" from index
options=DOCS to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS',

Since we are only making query time changes, I would really like to not
have to reindex our entire collection. Is that possible somehow?

Thanks,
Bjarke

Re: Re: Solr edismax parser with multi-word synonyms

2019-07-18 Thread Sunil Srinivasan

Hi Erick, 
Is there anyway I can get it to match documents containing at least one of the 
words of the original query? i.e. 'frozen' or 'dinner' or both. (But not 
partial matches of the synonyms)
Thanks,Sunil


-Original Message-
From: Erick Erickson 
To: solr-user 
Sent: Thu, Jul 18, 2019 04:42 AM
Subject: Re: Solr edismax parser with multi-word synonyms


This is not a phrase query, rather it’s requiring either pair of words
to appear in the title.

You’ve told it that “frozen dinner” and “microwave foods” are synonyms. 
So it’s looking for both the words “microwave” and “foods” in the title field, 
or “frozen” and “dinner” in the title field.

You’d see the same thing with single-word synonyms, albeit a little less
confusingly.


Best,
Erick


> On Jul 18, 2019, at 1:01 AM, kshitij tyagi  
> wrote:
> 
> Hi sunil,
> 
> 1. as you have added "microwave food" in synonym as a multiword synonym to
> "frozen dinner", edismax parsers finds your synonym in the file and is
> considering your query as a Phrase query.
> 
> This is the reason you are seeing parsed query as  +(((+title:microwave
> +title:food) (+title:frozen +title:dinner))), frozen dinner is considered
> as a phrase here.
> 
> If you want partial match on your query then you can add frozen dinner,
> microwave food, microwave, food to your synonym file and you will see the
> parsed query as:
> "+(((+title:microwave +title:food) title:miccrowave title:food
> (+title:frozen +title:dinner)))"
> Another option is to write your own custom query parser and use it as a
> plugin.
> 
> Hope this helps!!
> 
> kshitij
> 
> 
> On Thu, Jul 18, 2019 at 9:14 AM Sunil Srinivasan  wrote:
> 
>> 
>> I have enabled the SynonymGraphFilter in my field configuration in order
>> to support multi-word synonyms (I am using Solr 7.6). Here is my field
>> configuration:
>> 
>>    
>>      
>>    
>> 
>>    
>>      
>>      > synonyms="synonyms.txt"/>
>>    
>> 
>> 
>> 
>> 
>> And this is my synonyms.txt file:
>> frozen dinner,microwave food
>> 
>> Scenario 1: blue shirt (query with no synonyms)
>> 
>> Here is my first Solr query:
>> 
>> http://localhost:8983/solr/base/search?q=blue+shirt=title=edismax=on
>> 
>> And this is the parsed query I see in the debug output:
>> +((title:blue) (title:shirt))
>> 
>> Scenario 2: frozen dinner (query with synonyms)
>> 
>> Now, here is my second Solr query:
>> 
>> http://localhost:8983/solr/base/search?q=frozen+dinner=title=edismax=on
>> 
>> And this is the parsed query I see in the debug output:
>> +(((+title:microwave +title:food) (+title:frozen +title:dinner)))
>> 
>> I am wondering why the first query looks for documents containing at least
>> one of the two query tokens, whereas the second query looks for documents
>> with both of the query tokens? I would understand if it looked for both the
>> tokens of the synonyms (i.e. both microwave and food) to avoid the
>> sausagization problem. But I would like to get partial matches on the
>> original query at least (i.e. it should also match documents containing
>> just the token 'dinner').
>> 
>> Would any one know why the behavior is different across queries with and
>> without synonyms? And how could I work around this if I wanted partial
>> matches on queries that also have synonyms?
>> 
>> Ideally, I would like the parsed query in the second case to be:
>> +(((+title:microwave +title:food) (title:frozen title:dinner)))
>> 
>> I'd appreciate any help with this. Thanks!
>>

Re: Solr edismax parser with multi-word synonyms

2019-07-18 Thread Erick Erickson

This is not a phrase query, rather it’s requiring either pair of words
to appear in the title.

You’ve told it that “frozen dinner” and “microwave foods” are synonyms. 
So it’s looking for both the words “microwave” and “foods” in the title field, 
or “frozen” and “dinner” in the title field.

You’d see the same thing with single-word synonyms, albeit a little less
confusingly.


Best,
Erick


> On Jul 18, 2019, at 1:01 AM, kshitij tyagi  
> wrote:
> 
> Hi sunil,
> 
> 1. as you have added "microwave food" in synonym as a multiword synonym to
> "frozen dinner", edismax parsers finds your synonym in the file and is
> considering your query as a Phrase query.
> 
> This is the reason you are seeing parsed query as  +(((+title:microwave
> +title:food) (+title:frozen +title:dinner))), frozen dinner is considered
> as a phrase here.
> 
> If you want partial match on your query then you can add frozen dinner,
> microwave food, microwave, food to your synonym file and you will see the
> parsed query as:
> "+(((+title:microwave +title:food) title:miccrowave title:food
> (+title:frozen +title:dinner)))"
> Another option is to write your own custom query parser and use it as a
> plugin.
> 
> Hope this helps!!
> 
> kshitij
> 
> 
> On Thu, Jul 18, 2019 at 9:14 AM Sunil Srinivasan  wrote:
> 
>> 
>> I have enabled the SynonymGraphFilter in my field configuration in order
>> to support multi-word synonyms (I am using Solr 7.6). Here is my field
>> configuration:
>> 
>>
>>  
>>
>> 
>>
>>  
>>  > synonyms="synonyms.txt"/>
>>
>> 
>> 
>> 
>> 
>> And this is my synonyms.txt file:
>> frozen dinner,microwave food
>> 
>> Scenario 1: blue shirt (query with no synonyms)
>> 
>> Here is my first Solr query:
>> 
>> http://localhost:8983/solr/base/search?q=blue+shirt=title=edismax=on
>> 
>> And this is the parsed query I see in the debug output:
>> +((title:blue) (title:shirt))
>> 
>> Scenario 2: frozen dinner (query with synonyms)
>> 
>> Now, here is my second Solr query:
>> 
>> http://localhost:8983/solr/base/search?q=frozen+dinner=title=edismax=on
>> 
>> And this is the parsed query I see in the debug output:
>> +(((+title:microwave +title:food) (+title:frozen +title:dinner)))
>> 
>> I am wondering why the first query looks for documents containing at least
>> one of the two query tokens, whereas the second query looks for documents
>> with both of the query tokens? I would understand if it looked for both the
>> tokens of the synonyms (i.e. both microwave and food) to avoid the
>> sausagization problem. But I would like to get partial matches on the
>> original query at least (i.e. it should also match documents containing
>> just the token 'dinner').
>> 
>> Would any one know why the behavior is different across queries with and
>> without synonyms? And how could I work around this if I wanted partial
>> matches on queries that also have synonyms?
>> 
>> Ideally, I would like the parsed query in the second case to be:
>> +(((+title:microwave +title:food) (title:frozen title:dinner)))
>> 
>> I'd appreciate any help with this. Thanks!
>>

Re: Solr edismax parser with multi-word synonyms

2019-07-18 Thread kshitij tyagi

Hi sunil,

1. as you have added "microwave food" in synonym as a multiword synonym to
"frozen dinner", edismax parsers finds your synonym in the file and is
considering your query as a Phrase query.

This is the reason you are seeing parsed query as  +(((+title:microwave
+title:food) (+title:frozen +title:dinner))), frozen dinner is considered
as a phrase here.

If you want partial match on your query then you can add frozen dinner,
microwave food, microwave, food to your synonym file and you will see the
parsed query as:
"+(((+title:microwave +title:food) title:miccrowave title:food
(+title:frozen +title:dinner)))"
 Another option is to write your own custom query parser and use it as a
plugin.

Hope this helps!!

kshitij


On Thu, Jul 18, 2019 at 9:14 AM Sunil Srinivasan  wrote:

>
> I have enabled the SynonymGraphFilter in my field configuration in order
> to support multi-word synonyms (I am using Solr 7.6). Here is my field
> configuration:
> 
> 
>   
> 
>
> 
>   
>synonyms="synonyms.txt"/>
> 
> 
>
> 
>
> And this is my synonyms.txt file:
> frozen dinner,microwave food
>
> Scenario 1: blue shirt (query with no synonyms)
>
> Here is my first Solr query:
>
> http://localhost:8983/solr/base/search?q=blue+shirt=title=edismax=on
>
> And this is the parsed query I see in the debug output:
> +((title:blue) (title:shirt))
>
> Scenario 2: frozen dinner (query with synonyms)
>
> Now, here is my second Solr query:
>
> http://localhost:8983/solr/base/search?q=frozen+dinner=title=edismax=on
>
> And this is the parsed query I see in the debug output:
> +(((+title:microwave +title:food) (+title:frozen +title:dinner)))
>
> I am wondering why the first query looks for documents containing at least
> one of the two query tokens, whereas the second query looks for documents
> with both of the query tokens? I would understand if it looked for both the
> tokens of the synonyms (i.e. both microwave and food) to avoid the
> sausagization problem. But I would like to get partial matches on the
> original query at least (i.e. it should also match documents containing
> just the token 'dinner').
>
> Would any one know why the behavior is different across queries with and
> without synonyms? And how could I work around this if I wanted partial
> matches on queries that also have synonyms?
>
> Ideally, I would like the parsed query in the second case to be:
> +(((+title:microwave +title:food) (title:frozen title:dinner)))
>
> I'd appreciate any help with this. Thanks!
>

Solr edismax parser with multi-word synonyms

2019-07-17 Thread Sunil Srinivasan


I have enabled the SynonymGraphFilter in my field configuration in order to 
support multi-word synonyms (I am using Solr 7.6). Here is my field 
configuration:


  



  
  





And this is my synonyms.txt file:
frozen dinner,microwave food

Scenario 1: blue shirt (query with no synonyms)

Here is my first Solr query:
http://localhost:8983/solr/base/search?q=blue+shirt=title=edismax=on

And this is the parsed query I see in the debug output:
+((title:blue) (title:shirt))

Scenario 2: frozen dinner (query with synonyms)

Now, here is my second Solr query:
http://localhost:8983/solr/base/search?q=frozen+dinner=title=edismax=on

And this is the parsed query I see in the debug output:
+(((+title:microwave +title:food) (+title:frozen +title:dinner)))

I am wondering why the first query looks for documents containing at least one 
of the two query tokens, whereas the second query looks for documents with both 
of the query tokens? I would understand if it looked for both the tokens of the 
synonyms (i.e. both microwave and food) to avoid the sausagization problem. But 
I would like to get partial matches on the original query at least (i.e. it 
should also match documents containing just the token 'dinner').

Would any one know why the behavior is different across queries with and 
without synonyms? And how could I work around this if I wanted partial matches 
on queries that also have synonyms?

Ideally, I would like the parsed query in the second case to be:
+(((+title:microwave +title:food) (title:frozen title:dinner)))

I'd appreciate any help with this. Thanks!

Re: How to use stopwords, synonyms along with fuzzy match in a SOLR

2019-05-09 Thread Erick Erickson

Ah, I didn’t read thoroughly enough. The problem is stopwords don’t really 
count for fuzzy searching. By specifying “junk~” you’re not really searching 
for “junk” or variants. You’re telling Solr “find any term that is a fuzzy 
match” to “junk”. Under the covers, a search is being made for “jank OR jack 
OR…) for however many terms are within the edit distance specified for “junk”.

So Solr is behaving as expected. Imagine if it worked as you expect and 
stopwords were removed before applying the fuzzy logic. Then the complaint 
would be “Hey, I know I have words in my corpus ('jack' in this case) that 
should match the fuzzy term 'junk~’ but I don’t get any results back”.

Notice that no document with straight “junk” in the text will be returned 
absent other matching fuzzy terms.

Best,
Erick

> On May 9, 2019, at 11:17 AM, bbarani  wrote:
> 
> 
>
>
>
> ignoreCase="true"/>
>
>
>
>
> ignoreCase="true"/>
>
>

Re: How to use stopwords, synonyms along with fuzzy match in a SOLR

2019-05-09 Thread bbarani

Thanks for your reply Erick.

I create a simple field type as below for testing and added 'junk' to the
stopwords but it doesnt seem to honor it when using fuzzzy search

Btw, I am using qf along with edismax and pass the value in q (sample query
below).

/solr/collection1/select?qf=title_autoComplete=false=productName=edismax=junk~=true=100%25=defaultMarketingSequence%20asc=1


 















 Headphone *Jack* Adapter Cable




junk~
junk~

(+DisjunctionMaxQuery((title_autoComplete:junk~2)))/no_coord

+(title_autoComplete:junk~2)


1.5424817 = sum of: 1.5424817 = weight(title_autoComplete:jack in 190)
[SchemaSimilarity], result of: 1.5424817 = score(doc=190,freq=1.0 =
termFreq=1.0 ), product of: 0.5 = boost 3.0849633 = idf, computed as log(1 +
(docCount - docFreq + 0.5) / (docFreq + 0.5)) from: 37.0 = docFreq 819.0 =
docCount 1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1) from: 1.0
= termFreq=1.0 1.2 = parameter k1 0.0 = parameter b (norms omitted for
field)





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: How to use stopwords, synonyms along with fuzzy match in a SOLR

2019-05-08 Thread Erick Erickson

Well, I’d start by adding debug=true, that’ll show you the parsed query as well 
as why certain documents scored the way they did. But do note that q=junk~ will 
search against the default text field (the ”df” parameter in the request 
handler definition in solrconfig.xml). Is that what you’re expecting?

Or, I suppose, it’s searching against the fields defined if you’re using 
(e)dismax as your query parser. But the debut output (parsed query part) will 
show what the actual search is.

You should also look at the admin/analysis page. For instance, the way you have 
the field defined at index time, it’ll break on whitespace. But “junk.” won’t 
be found because your stopword doesn’t contain the period.

Plus, your EdgeNGramFilterFactory is pretty strange. A min gram size of 1 means 
you’re searching for single characters.

So what I’d do is back off the definition and build it up bit by bit to see 
if/when you have this problem. But if stopwords are working correctly at index 
time, the “junk” will not be _in_ the index, therefore it’ll be impossible to 
find fuzzy search or not. So you’re making some assumptions that aren’t true, 
and the analysis process combined with looking at the parsed query should show 
you quite a lot.

Best,
Erick

> On May 8, 2019, at 4:43 PM, bbarani  wrote:
> 
> Hi,
> Is there a way to use stopwords and fuzzy match in a SOLR query?
> 
> The below query matches 'jack' too and I added 'junk' to the stopwords (in
> query) to avoid returning results but looks like its not honoring the
> stopwords when using the fuzzy search. 
> 
> solr/collection1/select?app-qf=title_autoComplete=false=*=true=-1=marketingSequence%20asc=productId=true=on=categoryFilter=defaultMarketingSequence%20asc=junk~
> 
> 
>
>
> ignoreCase="true"/>
>
>
>
>
> synonyms="synonyms.txt"/>
> catenateNumbers="0" generateNumberParts="0" generateWordParts="0"
> preserveOriginal="1" catenateAll="0" catenateWords="1"/>
> minGramSize="1"/>
>
>
> ignoreCase="true"/>
>
>
>
>
> synonyms="synonyms.txt"/>
> catenateNumbers="0" generateNumberParts="0" generateWordParts="0"
> preserveOriginal="1" catenateAll="0" catenateWords="1"/>
>
>
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

How to use stopwords, synonyms along with fuzzy match in a SOLR

2019-05-08 Thread bbarani

Hi,
Is there a way to use stopwords and fuzzy match in a SOLR query?

The below query matches 'jack' too and I added 'junk' to the stopwords (in
query) to avoid returning results but looks like its not honoring the
stopwords when using the fuzzy search. 

solr/collection1/select?app-qf=title_autoComplete=false=*=true=-1=marketingSequence%20asc=productId=true=on=categoryFilter=defaultMarketingSequence%20asc=junk~


























--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

How to use stopwords, synonyms along with fuzzy match in a SOLR

2019-05-08 Thread bbarani

Hi,
Is there a way to use stopwords and fuzzy match in a SOLR query?

The below query matches 'jack' too and I added 'junk' to the stopwords (in
query) to avoid returning results but looks like its not honoring the
stopwords when using the fuzzy search. 

solr/collection1/select?app-qf=title_autoComplete=false=*=true=-1=marketingSequence%20asc=productId=true=on=categoryFilter=defaultMarketingSequence%20asc=junk~


























--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

1969 vs 1960s: not-quite-synonyms in Solr

2019-03-06 Thread Gregg Donovan

For a search like "1969 shirt" I would like to return items with either
1969 or 1960s but boost 1969 items higher. For the query "1960s shirt",
1960s and 1960, 1961, ... 1969 should all match equally.

Is there a standard technique for this? I'm struggling to do this with
eDisMax without adding new fields to the index.

Thanks.

Gregg

Re: Reload synonyms without reloading the multiple collections

2018-12-30 Thread Simón de Frosterus Pokrzywnicki

Sorry, I see that it may have been confusing.

My webapp calls the reload of all the affected Collections (about a dozen
of them) in sequential mode using the Collections API.

Ideally I would be able to write some QueryTimeSynonymFilterFactory that
would periodically or when told, reload the synonym's file from ZK, which
is what the system edits when a user changes some synonyms.

I understand that a Collection needs to be reloaded if the synonyms were to
be used at indexation time, but this is not my case.

The managed API is on the same situation, basically it does what I am doing
on my own right now. At the end, there has to be a reload of the affected
Collections.

Regards,
Simón

On Sun, Dec 30, 2018 at 5:01 AM Shawn Heisey  wrote:

> On 12/29/2018 5:55 AM, Simón de Frosterus Pokrzywnicki wrote:
> > The problem is that when the user changes the synonyms, it automatically
> > triggers a sequential reload of all the Collections.
>
> What exactly is being done when you say "the user changes the
> synonyms"?  Just uploading a new synonyms definition file to zookeeper
> would *NOT* result in a reload of *ANY* collection.  As far as I am
> aware, collection reloads only happen when they are explicitly
> requested.  Usage of the managed APIs to change aspects of the schema
> could cause a reload, but it's only going to happen on the collection
> where the API is used, not all collections.
>
> Basically, I cannot imagine any situation that would cause a reload of
> all collections, other than explicitly asking Solr to do those reloads.
>
> Thanks,
> Shawn
>
>

Re: Reload synonyms without reloading the multiple collections

2018-12-29 Thread Shawn Heisey


On 12/29/2018 5:55 AM, Simón de Frosterus Pokrzywnicki wrote:

The problem is that when the user changes the synonyms, it automatically
triggers a sequential reload of all the Collections.


What exactly is being done when you say "the user changes the 
synonyms"?  Just uploading a new synonyms definition file to zookeeper 
would *NOT* result in a reload of *ANY* collection.  As far as I am 
aware, collection reloads only happen when they are explicitly 
requested.  Usage of the managed APIs to change aspects of the schema 
could cause a reload, but it's only going to happen on the collection 
where the API is used, not all collections.


Basically, I cannot imagine any situation that would cause a reload of 
all collections, other than explicitly asking Solr to do those reloads.


Thanks,
Shawn

Reload synonyms without reloading the multiple collections

2018-12-29 Thread Simón de Frosterus Pokrzywnicki

Hello,

I have a solrcloud setup with multiple Collections based on the same
configset.

One of the features I have is that the user can define their own synonyms
in order to improve their search experience which has worked fine until
recently.

Lately the platform has grown and the user has several dozen Collections,
must of them with 200k or more documents of non-trivial size.

The problem is that when the user changes the synonyms, it automatically
triggers a sequential reload of all the Collections. This is now always
causing problems, to a point where the platform becomes unstable and may
need a restart of Solr, which means we have to access the platform and
manually stabilize it.

The synonyms are only used at query time, so there is no need to reindex
anything and it seems like overkill to reload the Collections to change the
synonyms.

I have tried creating my own CustomSynonymGraphFilter and have it call
the loadSynonyms()
method as needed but this causes some weird behavior where queries
sometimes have the newly added synonyms working fine but sometimes not. I
get the impression that there may be like N "threads" handling the queries
but I only change the SynonymMap for one of them, so when the query hits
the right "thread" it works, but in most cases it does not.

My custom fieldType looks like this:


  




  
  





  



I would like to know if there is some other class I can redefine to make
sure the new SynonymMap is used in all cases.

Thanks,
Simón

PS: I have upgraded to Solr 7.6.

Re: MoreLikeThis & Synonyms

2018-12-27 Thread Nicolas Paris

On Wed, Dec 26, 2018 at 09:09:02PM -0800, Erick Erickson wrote:
> bq. However multiword synonyms are only compatible with queryTime synonyms
> expansion.
> 
> Why do you say this? What version of Solr? Query-time mult-word
> synonyms were _added_, but AFAIK the capability of multi-word synonyms
> was not taken away. 

>From this blogpost [1] I deduced multi-word synonyms are only compatible
with query time expansion.

> Or are you saying that MLT doesn't play nice at all with multi-word
> synonyms?

>From my tests, MLT does not expand the query with synonyms. So it is not
possible to use query time synonyms nor mutli-word. Only index time is
possible with the limitations it has [1]

> What version of Solr are you using?

I am running solr 7.6.

[1] 
https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/

-- 
nicolas

Re: MoreLikeThis & Synonyms

2018-12-26 Thread Erick Erickson

bq. However multiword synonyms are only compatible with queryTime synonyms
expansion.

Why do you say this? What version of Solr? Query-time mult-word
synonyms were _added_, but
AFAIK the capability of multi-word synonyms was not taken away. Or are
you saying that
MLT doesn't play nice at all with multi-word synonyms?

What version of Solr are you using?

Best,
Erick

On Wed, Dec 26, 2018 at 5:25 AM Nicolas Paris  wrote:
>
> Hi
>
> It turns out that MoreLikeThis handler does not use queryTime synonyms
> expansion.
>
> It is only compatible with indexTime synonyms.
>
> However multiword synonyms are only compatible with queryTime synonyms
> expansion.
>
> For this reason this does not allow the use of multiword synonyms within
> together with the MoreLikeThis handler.
>
> Is there any reason for the MoreLikeThis feature not compatible with
> Multiword Synonyms  ?
>
> Thanks
> --
> nicolas

MoreLikeThis & Synonyms

2018-12-26 Thread Nicolas Paris

Hi

It turns out that MoreLikeThis handler does not use queryTime synonyms
expansion.

It is only compatible with indexTime synonyms.

However multiword synonyms are only compatible with queryTime synonyms
expansion.

For this reason this does not allow the use of multiword synonyms within
together with the MoreLikeThis handler.

Is there any reason for the MoreLikeThis feature not compatible with
Multiword Synonyms  ?

Thanks
-- 
nicolas

RE: KeywordRepeat, stemming, (single term) synonyms and minimum should match (edismax)

2018-11-29 Thread Markus Jelsma

Hello, 

Sorry for trying this once more. Is there anyone around who can help me, and 
perhaps others, on this subject and the linked Jira ticket and failing test?

I could really need some help from someone who is really familiar with edismax 
code and the underlying QueryBuilder parts that are used, and then get replaced 
by Solr code.

Many thanks,
Markus

 
 
-Original message-
> From:Markus Jelsma 
> Sent: Thursday 22nd November 2018 15:39
> To: solr-user@lucene.apache.org; solr-user 
> Subject: RE: KeywordRepeat, stemming, (single term) synonyms and minimum 
> should match (edismax)
> 
> Hello,
> 
> I have opened a SOLR-13009 describing the problem. The attached patch 
> contains a unit test proving the problem, i.e. the test fails. Any help would 
> be greatly appreciated.
> 
> Many thanks,
> Markus
> 
> https://issues.apache.org/jira/browse/SOLR-13009
> 
>  
>  
> -Original message-
> > From:Markus Jelsma 
> > Sent: Sunday 18th November 2018 23:21
> > To: solr-user@lucene.apache.org; solr-user 
> > Subject: RE: KeywordRepeat, stemming, (single term) synonyms and minimum 
> > should match (edismax)
> > 
> > Hello,
> > 
> > Apologies for bothering you all again, but i really need some help in this 
> > matter. How can we resolve this issue? Are we dealing with a bug here (then 
> > i'll open a ticket), am i doing something wrong?
> > 
> > Is here anyone who had the same issue or understand the problem?
> > 
> > Many thanks,
> > Markus 
> > 
> >  
> >  
> > -Original message-
> > > From:Markus Jelsma 
> > > Sent: Tuesday 13th November 2018 9:52
> > > To: solr-user 
> > > Subject: KeywordRepeat, stemming, (single term) synonyms and minimum 
> > > should match (edismax)
> > > 
> > > Hello, apologies for this long winded e-mail.
> > > 
> > > Our fields have KeywordRepeat and language specific filters such as a 
> > > stemmer, the final filter at query-time is SynonymGraph. We do not use 
> > > RemoveDuplicatesFilter for those of you wondering why when you see the 
> > > parsed queries below, this is due to [1]. 
> > > 
> > > We use a custom QParser extending edismax and also extend 
> > > ExtendedSolrQueryParser, so we are able to override newFieldQuery in case 
> > > we have to. The problem also directly applies to Solr's vanilla edismax. 
> > > The file synonyms.txt contains the stemmed versions of the original terms.
> > > 
> > > Consider this example synonym set [bier,brouw] where bier means beer and 
> > > brouw is the stemmed version of brouwsel (brewage, concoction), and 
> > > consider these parameters on /select: 
> > > qf=content_nl=edismax=2<-1 5<-2 6<90%25.
> > > 
> > > The queries q=bier and q=brouw both parse to the following query and give 
> > > the desired results (notice the missing RemoveDuplicates here):
> > > +(((Synonym(content_nl:bier content_nl:brouw) Synonym(content_nl:bier 
> > > content_nl:brouw))~2))
> > > 
> > > However, for q=brouwsel something (partially) unexpected happens:
> > > +(((content_nl:brouwsel Synonym(content_nl:bier content_nl:brouw))~2))
> > > 
> > > This results in a BooleanQuery where, due to mm=2, both clauses need to 
> > > match, giving very few matches. Removing KeywordRepeat or setting mm=1 of 
> > > course fixes the problem, but that is not what we want.
> > > 
> > > What is also unexpected, and may be related to the problem, is that when 
> > > checking the analzer output via the GUI, we see the position incrementing 
> > > when KeywordRepeat and SynonymGraph are combined. When these filters are 
> > > not combined, the positions are always 1, as expected. When combined we 
> > > get this for 'brouw':
> > > term: bier brouw bier brouw
> > > pos:  1 1 2  2
> > > 
> > > or for 'brouwsel':
> > > term: brouwsel bier brouw
> > > pos:  1   2  2
> > > 
> > > ExtendedSolrQueryParser, and everything underneath, is a complicated 
> > > piece of code. In the end it extends Lucene's QueryBuilder, but not 
> > > always relying on its results, it seems. Edismax for example 'resets' 
> > > minShouldMatch in SolrPluginUtils.setMinShouldMatch(), so this is a 
> > > complicated web of code and i am a bit too deep in this unfamiliar area, 
> > > and i am in need of help here.
> > > 
> > > So, my question is, how to solve this problem? Or how to approach it?  
> > > What is the actual problem? How can i get the same stable results for 
> > > both queries? Does the odd positon increment have anything to do with it 
> > > (it seems Lucene's QueryBuilder does something with it). What do i need 
> > > to do?
> > > 
> > > Many thanks,
> > > Markus
> > > 
> > > ps. this is on Solr 7.2.1 and 7.5.0.
> > > 
> > > [1] 
> > > http://lucene.472066.n3.nabble.com/Multiple-languages-boosting-and-stemming-and-KeywordRepeat-td4389086.html
> > > 
> > 
>

Re: Is reload necessary for updates to files referenced in schema, like synonyms, protwords, etc?

2018-11-28 Thread Shawn Heisey


On 11/28/2018 6:37 AM, Vincenzo D'Amore wrote:

Very likely I'm late to this party :) not sure with solr standalone, but
with solrcloud (7.3.1) you have to reload the core every time synonyms
referenced by a schema are changed.


I have a 7.5.0 download on my workstation, so I fired that up, created a 
core, and tried it out.  I did learn that a reload is required when 
changing files referenced by analysis components in the schema.  That's 
what I had thought was probably the case, now I know for sure.


Thanks,
Shawn

Re: Is reload necessary for updates to files referenced in schema, like synonyms, protwords, etc?

2018-11-28 Thread Vincenzo D'Amore

Very likely I'm late to this party :) not sure with solr standalone, but
with solrcloud (7.3.1) you have to reload the core every time synonyms
referenced by a schema are changed.

On Mon, Nov 26, 2018 at 8:51 PM Walter Underwood 
wrote:

> Should be easy to check with the analysis UI. Add a synonym and see if it
> is used.
>
> I seem to remember some work on reloading synonyms on the fly without a
> core reload. These seem related...
>
> https://issues.apache.org/jira/browse/SOLR-5200
> https://issues.apache.org/jira/browse/SOLR-5234
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Nov 26, 2018, at 11:43 AM, Shawn Heisey  wrote:
> >
> > I know that changes to the schema require a reload.  But do changes to
> files referenced by a schema also require a reload?  So if for instance I
> were to change the contents of a synonym file, would I need to reload the
> core before Solr would use the new file?  Synonyms in this case are at
> query time, but other files like protwords are used at index time.
> >
> > I *THINK* that a reload is required, but I can't be sure without
> checking the code, and it would probably take me more than a couple of
> hours to unravel the code enough to answer the question myself.
> >
> > It is not SolrCloud, so there's no ZK to worry about.
> >
> > Thanks,
> > Shawn
> >
>
>

-- 
Vincenzo D'Amore

Re: Is reload necessary for updates to files referenced in schema, like synonyms, protwords, etc?

2018-11-26 Thread Walter Underwood

Should be easy to check with the analysis UI. Add a synonym and see if it is 
used.

I seem to remember some work on reloading synonyms on the fly without a core 
reload. These seem related...

https://issues.apache.org/jira/browse/SOLR-5200
https://issues.apache.org/jira/browse/SOLR-5234

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 26, 2018, at 11:43 AM, Shawn Heisey  wrote:
> 
> I know that changes to the schema require a reload.  But do changes to files 
> referenced by a schema also require a reload?  So if for instance I were to 
> change the contents of a synonym file, would I need to reload the core before 
> Solr would use the new file?  Synonyms in this case are at query time, but 
> other files like protwords are used at index time.
> 
> I *THINK* that a reload is required, but I can't be sure without checking the 
> code, and it would probably take me more than a couple of hours to unravel 
> the code enough to answer the question myself.
> 
> It is not SolrCloud, so there's no ZK to worry about.
> 
> Thanks,
> Shawn
>

Is reload necessary for updates to files referenced in schema, like synonyms, protwords, etc?

2018-11-26 Thread Shawn Heisey

I know that changes to the schema require a reload.  But do changes to 
files referenced by a schema also require a reload?  So if for instance 
I were to change the contents of a synonym file, would I need to reload 
the core before Solr would use the new file?  Synonyms in this case are 
at query time, but other files like protwords are used at index time.


I *THINK* that a reload is required, but I can't be sure without 
checking the code, and it would probably take me more than a couple of 
hours to unravel the code enough to answer the question myself.


It is not SolrCloud, so there's no ZK to worry about.

Thanks,
Shawn

RE: KeywordRepeat, stemming, (single term) synonyms and minimum should match (edismax)

2018-11-22 Thread Markus Jelsma

Hello,

I have opened a SOLR-13009 describing the problem. The attached patch contains 
a unit test proving the problem, i.e. the test fails. Any help would be greatly 
appreciated.

Many thanks,
Markus

https://issues.apache.org/jira/browse/SOLR-13009

 
 
-Original message-
> From:Markus Jelsma 
> Sent: Sunday 18th November 2018 23:21
> To: solr-user@lucene.apache.org; solr-user 
> Subject: RE: KeywordRepeat, stemming, (single term) synonyms and minimum 
> should match (edismax)
> 
> Hello,
> 
> Apologies for bothering you all again, but i really need some help in this 
> matter. How can we resolve this issue? Are we dealing with a bug here (then 
> i'll open a ticket), am i doing something wrong?
> 
> Is here anyone who had the same issue or understand the problem?
> 
> Many thanks,
> Markus 
> 
>  
>  
> -Original message-
> > From:Markus Jelsma 
> > Sent: Tuesday 13th November 2018 9:52
> > To: solr-user 
> > Subject: KeywordRepeat, stemming, (single term) synonyms and minimum should 
> > match (edismax)
> > 
> > Hello, apologies for this long winded e-mail.
> > 
> > Our fields have KeywordRepeat and language specific filters such as a 
> > stemmer, the final filter at query-time is SynonymGraph. We do not use 
> > RemoveDuplicatesFilter for those of you wondering why when you see the 
> > parsed queries below, this is due to [1]. 
> > 
> > We use a custom QParser extending edismax and also extend 
> > ExtendedSolrQueryParser, so we are able to override newFieldQuery in case 
> > we have to. The problem also directly applies to Solr's vanilla edismax. 
> > The file synonyms.txt contains the stemmed versions of the original terms.
> > 
> > Consider this example synonym set [bier,brouw] where bier means beer and 
> > brouw is the stemmed version of brouwsel (brewage, concoction), and 
> > consider these parameters on /select: qf=content_nl=edismax=2<-1 
> > 5<-2 6<90%25.
> > 
> > The queries q=bier and q=brouw both parse to the following query and give 
> > the desired results (notice the missing RemoveDuplicates here):
> > +(((Synonym(content_nl:bier content_nl:brouw) Synonym(content_nl:bier 
> > content_nl:brouw))~2))
> > 
> > However, for q=brouwsel something (partially) unexpected happens:
> > +(((content_nl:brouwsel Synonym(content_nl:bier content_nl:brouw))~2))
> > 
> > This results in a BooleanQuery where, due to mm=2, both clauses need to 
> > match, giving very few matches. Removing KeywordRepeat or setting mm=1 of 
> > course fixes the problem, but that is not what we want.
> > 
> > What is also unexpected, and may be related to the problem, is that when 
> > checking the analzer output via the GUI, we see the position incrementing 
> > when KeywordRepeat and SynonymGraph are combined. When these filters are 
> > not combined, the positions are always 1, as expected. When combined we get 
> > this for 'brouw':
> > term: bier brouw bier brouw
> > pos:  1 1 2  2
> > 
> > or for 'brouwsel':
> > term: brouwsel bier brouw
> > pos:  1   2  2
> > 
> > ExtendedSolrQueryParser, and everything underneath, is a complicated piece 
> > of code. In the end it extends Lucene's QueryBuilder, but not always 
> > relying on its results, it seems. Edismax for example 'resets' 
> > minShouldMatch in SolrPluginUtils.setMinShouldMatch(), so this is a 
> > complicated web of code and i am a bit too deep in this unfamiliar area, 
> > and i am in need of help here.
> > 
> > So, my question is, how to solve this problem? Or how to approach it?  What 
> > is the actual problem? How can i get the same stable results for both 
> > queries? Does the odd positon increment have anything to do with it (it 
> > seems Lucene's QueryBuilder does something with it). What do i need to do?
> > 
> > Many thanks,
> > Markus
> > 
> > ps. this is on Solr 7.2.1 and 7.5.0.
> > 
> > [1] 
> > http://lucene.472066.n3.nabble.com/Multiple-languages-boosting-and-stemming-and-KeywordRepeat-td4389086.html
> > 
>

RE: KeywordRepeat, stemming, (single term) synonyms and minimum should match (edismax)

2018-11-18 Thread Markus Jelsma

Hello,

Apologies for bothering you all again, but i really need some help in this 
matter. How can we resolve this issue? Are we dealing with a bug here (then 
i'll open a ticket), am i doing something wrong?

Is here anyone who had the same issue or understand the problem?

Many thanks,
Markus 

 
 
-Original message-
> From:Markus Jelsma 
> Sent: Tuesday 13th November 2018 9:52
> To: solr-user 
> Subject: KeywordRepeat, stemming, (single term) synonyms and minimum should 
> match (edismax)
> 
> Hello, apologies for this long winded e-mail.
> 
> Our fields have KeywordRepeat and language specific filters such as a 
> stemmer, the final filter at query-time is SynonymGraph. We do not use 
> RemoveDuplicatesFilter for those of you wondering why when you see the parsed 
> queries below, this is due to [1]. 
> 
> We use a custom QParser extending edismax and also extend 
> ExtendedSolrQueryParser, so we are able to override newFieldQuery in case we 
> have to. The problem also directly applies to Solr's vanilla edismax. The 
> file synonyms.txt contains the stemmed versions of the original terms.
> 
> Consider this example synonym set [bier,brouw] where bier means beer and 
> brouw is the stemmed version of brouwsel (brewage, concoction), and consider 
> these parameters on /select: qf=content_nl=edismax=2<-1 5<-2 
> 6<90%25.
> 
> The queries q=bier and q=brouw both parse to the following query and give the 
> desired results (notice the missing RemoveDuplicates here):
> +(((Synonym(content_nl:bier content_nl:brouw) Synonym(content_nl:bier 
> content_nl:brouw))~2))
> 
> However, for q=brouwsel something (partially) unexpected happens:
> +(((content_nl:brouwsel Synonym(content_nl:bier content_nl:brouw))~2))
> 
> This results in a BooleanQuery where, due to mm=2, both clauses need to 
> match, giving very few matches. Removing KeywordRepeat or setting mm=1 of 
> course fixes the problem, but that is not what we want.
> 
> What is also unexpected, and may be related to the problem, is that when 
> checking the analzer output via the GUI, we see the position incrementing 
> when KeywordRepeat and SynonymGraph are combined. When these filters are not 
> combined, the positions are always 1, as expected. When combined we get this 
> for 'brouw':
> term: bier brouw bier brouw
> pos:  1 1 2  2
> 
> or for 'brouwsel':
> term: brouwsel bier brouw
> pos:  1   2  2
> 
> ExtendedSolrQueryParser, and everything underneath, is a complicated piece of 
> code. In the end it extends Lucene's QueryBuilder, but not always relying on 
> its results, it seems. Edismax for example 'resets' minShouldMatch in 
> SolrPluginUtils.setMinShouldMatch(), so this is a complicated web of code and 
> i am a bit too deep in this unfamiliar area, and i am in need of help here.
> 
> So, my question is, how to solve this problem? Or how to approach it?  What 
> is the actual problem? How can i get the same stable results for both 
> queries? Does the odd positon increment have anything to do with it (it seems 
> Lucene's QueryBuilder does something with it). What do i need to do?
> 
> Many thanks,
> Markus
> 
> ps. this is on Solr 7.2.1 and 7.5.0.
> 
> [1] 
> http://lucene.472066.n3.nabble.com/Multiple-languages-boosting-and-stemming-and-KeywordRepeat-td4389086.html
>

KeywordRepeat, stemming, (single term) synonyms and minimum should match (edismax)

2018-11-13 Thread Markus Jelsma

Hello, apologies for this long winded e-mail.

Our fields have KeywordRepeat and language specific filters such as a stemmer, 
the final filter at query-time is SynonymGraph. We do not use 
RemoveDuplicatesFilter for those of you wondering why when you see the parsed 
queries below, this is due to [1]. 

We use a custom QParser extending edismax and also extend 
ExtendedSolrQueryParser, so we are able to override newFieldQuery in case we 
have to. The problem also directly applies to Solr's vanilla edismax. The file 
synonyms.txt contains the stemmed versions of the original terms.

Consider this example synonym set [bier,brouw] where bier means beer and brouw 
is the stemmed version of brouwsel (brewage, concoction), and consider these 
parameters on /select: qf=content_nl=edismax=2<-1 5<-2 6<90%25.

The queries q=bier and q=brouw both parse to the following query and give the 
desired results (notice the missing RemoveDuplicates here):
+(((Synonym(content_nl:bier content_nl:brouw) Synonym(content_nl:bier 
content_nl:brouw))~2))

However, for q=brouwsel something (partially) unexpected happens:
+(((content_nl:brouwsel Synonym(content_nl:bier content_nl:brouw))~2))

This results in a BooleanQuery where, due to mm=2, both clauses need to match, 
giving very few matches. Removing KeywordRepeat or setting mm=1 of course fixes 
the problem, but that is not what we want.

What is also unexpected, and may be related to the problem, is that when 
checking the analzer output via the GUI, we see the position incrementing when 
KeywordRepeat and SynonymGraph are combined. When these filters are not 
combined, the positions are always 1, as expected. When combined we get this 
for 'brouw':
term: bier brouw bier brouw
pos:  1 1 2  2

or for 'brouwsel':
term: brouwsel bier brouw
pos:  1   2  2

ExtendedSolrQueryParser, and everything underneath, is a complicated piece of 
code. In the end it extends Lucene's QueryBuilder, but not always relying on 
its results, it seems. Edismax for example 'resets' minShouldMatch in 
SolrPluginUtils.setMinShouldMatch(), so this is a complicated web of code and i 
am a bit too deep in this unfamiliar area, and i am in need of help here.

So, my question is, how to solve this problem? Or how to approach it?  What is 
the actual problem? How can i get the same stable results for both queries? 
Does the odd positon increment have anything to do with it (it seems Lucene's 
QueryBuilder does something with it). What do i need to do?

Many thanks,
Markus

ps. this is on Solr 7.2.1 and 7.5.0.

[1] 
http://lucene.472066.n3.nabble.com/Multiple-languages-boosting-and-stemming-and-KeywordRepeat-td4389086.html

Re: Synonyms relationships

2018-10-31 Thread Doug Turnbull

Synonyms in Solr are really a kind of "programmers" tool, useful for
mapping terms to other terms. This need not correspond to linguistic
notions of a synonym or hypernomy/hyponomy.

That being said, there's probably half a dozen approaches for doing these
kinds of taxonomical relationships in Solr on top of synonyms

Here's some resources / techniques we use at OpenSource Connections for
clients
https://www.youtube.com/watch?v=90F30PS-884
https://opensourceconnections.com/blog/2017/11/21/solr-synonyms-mea-culpa/
https://opensourceconnections.com/blog/2016/12/23/elasticsearch-synonyms-patterns-taxonomies/

(last one is ES, but same ideas apply...)

Best,
-Doug

On Wed, Oct 31, 2018 at 6:20 AM Nicolas Paris 
wrote:

> Hi
>
> Does SolR provide a way to describe synonyms relationships such
> "equivalent to" ,"narrower thant", "broader than" ?
>
> It turns out both postgres and oracle do, but I can't find any related
> information in the documentation.
>
> This is useful to allow generalizing the terms of the research or not.
>
> Thanks ,
>
>
> --
> nicolas
>
-- 
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug

Synonyms relationships

2018-10-31 Thread Nicolas Paris

Hi

Does SolR provide a way to describe synonyms relationships such
"equivalent to" ,"narrower thant", "broader than" ?

It turns out both postgres and oracle do, but I can't find any related
information in the documentation.

This is useful to allow generalizing the terms of the research or not.

Thanks ,


-- 
nicolas

Re: synonyms for Solr Cloud -

2018-09-30 Thread Zheng Lin Edwin Yeo

You can have the synonyms text file in the same config folder as the rest
of your files like solrconfig.xml that you will push to Solr Cloud.
When you push the config file to Solr Cloud, the synonyms text file will be
push in to Solr Cloud together.
In your solrconfig.xml, you will need to add the SynonymFilterFactory as
per normal.

Regards,
Edwin

On Wed, 19 Sep 2018 at 11:58, Rathor, Piyush (US - Philadelphia) <
prat...@deloitte.com> wrote:

> Hi All,
>
>
>
> How can we add a synonyms text file to solr cloud. I have a text file with
> comma separated synonyms.
>
>
>
>
>
> *Thanks & Regards*
>
> *Piyush Rathor*
>
> Consultant
>
> Deloitte Digital (Salesforce.com / Force.com)
>
> Deloitte Consulting Pvt. Ltd.
>
> *Office*: +1 (615) 209 4980
>
> *Mobile *: +1 (302) 397 1491
>
> prat...@deloitte.com | www.deloitte.com
>
> [image: cid:image001.png@01D012F3.6C4D42E0]
>
> Please consider the environment before printing.
>
>
>
> This message (including any attachments) contains confidential information
> intended for a specific individual and purpose, and is protected by law. If
> you are not the intended recipient, you should delete this message and any
> disclosure, copying, or distribution of this message, or the taking of any
> action based on it, by you is strictly prohibited.
>
> v.E.1
>

synonyms for Solr Cloud -

2018-09-18 Thread Rathor, Piyush (US - Philadelphia)

Hi All,

How can we add a synonyms text file to solr cloud. I have a text file with 
comma separated synonyms.


Thanks & Regards
Piyush Rathor
Consultant
Deloitte Digital (Salesforce.com / Force.com)
Deloitte Consulting Pvt. Ltd.
Office: +1 (615) 209 4980
Mobile : +1 (302) 397 1491
prat...@deloitte.com<mailto:prat...@deloitte.com> | 
www.deloitte.com<http://www.deloitte.com/>
[cid:image001.png@01D012F3.6C4D42E0]
Please consider the environment before printing.


This message (including any attachments) contains confidential information 
intended for a specific individual and purpose, and is protected by law. If you 
are not the intended recipient, you should delete this message and any 
disclosure, copying, or distribution of this message, or the taking of any 
action based on it, by you is strictly prohibited.

v.E.1

Re: Multi-word Synonyms - how does sow parameter work?

2018-08-16 Thread Roy Lim

Thanks Andrea for the tip.  I wasn't aware of the autoGeneratePhraseQueries
option for text fields, will definitely keep it in mind.

But I question if this is related to the fix on the query parser which
essentially introduces sow parameter and if false (looks like that is the
default in Solr 7), multiwords should be sent as a 'single input' (see
https://issues.apache.org/jira/browse/LUCENE-2605).  That defect doesn't
make mention of autoGeneratePhraseQueries.

I think this is where my confusion lies: as a non-developer unfortunately
I'm not clear what 'multiwords will be sent as a single input' means,
should it mean that it is treated as a phrase query?  Use AND?  So far as
mentioned I only observe that it is just OR clauses, which is no different
than before the fix.

Thanks again!



On Thu, Aug 16, 2018 at 12:39 AM, Andrea Gazzarini 
wrote:

> Hi Roy, I think you miss the autoGeneratePhraseQueries=true in the field
> type definition.
> I was on a slightly different use case when I met your same issue (I was
> using synonyms expansion at query time) and honestly I didn't understand
> why this is not the default and implicit behavior. In other words, like
> you, I can't imagine a scenario where I would a multi-terms synonym be
> destructured in multiple OR clauses.
>
> Best,
> Andrea
>
>
> On 16/08/18 02:07, Roy Lim wrote:
>
>> I am not using edismax (eventually I would like to get there) but I'm just
>> testing with standard query right now.  Original posting:
>>
>> I'm trying to figure out why the multi-word synonym expansion is not
>> working correctly (or, at least what I'm misunderstanding).  Specifically,
>> when I test a standard query with Solr Admin it appears to still split on
>> whitespace.
>>
>> Here is my setup:
>> - Solr 7.2.1
>> - synonym example: LCD => liquid crystal display
>> - q=myfield:LCD
>> - added parameter: sow=false
>> - myfield schema looks like (analyzer both applicable to index and query
>> time):
>> 
>> > positionIncrementGap="100">
>>
>>  
>>  > synonyms="synonyms.txt"/>
>>  ...
>> 
>>
>> When debugging the query, Solr Admin shows the parsed query as:
>> 
>> myfield:liquid myfield:crystal myfield:display
>> 
>>
>> (default operator being OR), as you can see it would incorrectly match on
>> any of those words, but not all, which is what I would expect...
>>
>> Should it not do a phrase query search for the exact translated synonym,
>> "liquid crystal display"?
>>
>>
>>
>> On Wed, Aug 15, 2018 at 5:01 PM, Doug Turnbull <
>> dturnb...@opensourceconnections.com> wrote:
>>
>> Also share your fieldType settings for myfield as well from your schema
>>> On Wed, Aug 15, 2018 at 8:00 PM Doug Turnbull <
>>> dturnb...@opensourceconnections.com> wrote:
>>>
>>> Aside from the screenshot issue, one  thing to check: are you searching
>>>> with defType=edismax ?
>>>>
>>>> As in
>>>> q=lcd=myfield=false=edismax
>>>>
>>>> ?
>>>>
>>>> Also sow=false should the the default on Solr 7 and above
>>>>
>>>> Doug
>>>>
>>>> On Wed, Aug 15, 2018 at 6:27 PM Roy Lim  wrote:
>>>>
>>>> I'm trying to figure out why the multi-word synonym expansion is not
>>>>> working
>>>>> correctly.  Specifically, when I test a standard query with Solr Admin
>>>>>
>>>> it
>>>
>>>> is
>>>>> still splitting on whitespace.
>>>>>
>>>>> Here is my setup:
>>>>> - Solr 7.2.1
>>>>> - synonym LCD => liquid crystal display
>>>>> - q=myfield:LCD
>>>>> - added: sow=false
>>>>> - myfield looks like:
>>>>>
>>>>>
>>>>> Solr Admin shows the parsed query looks like:
>>>>>
>>>>> myfield:liquid myfield:crystal myfield:display
>>>>>
>>>>> (default operator being OR), which would incorrectly match documents
>>>>>
>>>> with
>>>
>>>> any of those words, but not all, which is what I would expect...
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>>>>
>>>>> --
>>>> CTO, OpenSource Connections
>>>> Author, Relevant Search
>>>> http://o19s.com/doug
>>>>
>>>> --
>>> CTO, OpenSource Connections
>>> Author, Relevant Search
>>> http://o19s.com/doug
>>>
>>>
>

Re: Multi-word Synonyms - how does sow parameter work?

2018-08-16 Thread Andrea Gazzarini

Hi Roy, I think you miss the autoGeneratePhraseQueries=true in the field 
type definition.
I was on a slightly different use case when I met your same issue (I was 
using synonyms expansion at query time) and honestly I didn't understand 
why this is not the default and implicit behavior. In other words, like 
you, I can't imagine a scenario where I would a multi-terms synonym be 
destructured in multiple OR clauses.


Best,
Andrea

On 16/08/18 02:07, Roy Lim wrote:

I am not using edismax (eventually I would like to get there) but I'm just
testing with standard query right now.  Original posting:

I'm trying to figure out why the multi-word synonym expansion is not
working correctly (or, at least what I'm misunderstanding).  Specifically,
when I test a standard query with Solr Admin it appears to still split on
whitespace.

Here is my setup:
- Solr 7.2.1
- synonym example: LCD => liquid crystal display
- q=myfield:LCD
- added parameter: sow=false
- myfield schema looks like (analyzer both applicable to index and query
time):


   
 
 
 ...


When debugging the query, Solr Admin shows the parsed query as:

myfield:liquid myfield:crystal myfield:display


(default operator being OR), as you can see it would incorrectly match on
any of those words, but not all, which is what I would expect...

Should it not do a phrase query search for the exact translated synonym,
"liquid crystal display"?



On Wed, Aug 15, 2018 at 5:01 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:


Also share your fieldType settings for myfield as well from your schema
On Wed, Aug 15, 2018 at 8:00 PM Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:


Aside from the screenshot issue, one  thing to check: are you searching
with defType=edismax ?

As in
q=lcd=myfield=false=edismax

?

Also sow=false should the the default on Solr 7 and above

Doug

On Wed, Aug 15, 2018 at 6:27 PM Roy Lim  wrote:


I'm trying to figure out why the multi-word synonym expansion is not
working
correctly.  Specifically, when I test a standard query with Solr Admin

it

is
still splitting on whitespace.

Here is my setup:
- Solr 7.2.1
- synonym LCD => liquid crystal display
- q=myfield:LCD
- added: sow=false
- myfield looks like:


Solr Admin shows the parsed query looks like:

myfield:liquid myfield:crystal myfield:display

(default operator being OR), which would incorrectly match documents

with

any of those words, but not all, which is what I would expect...





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


--
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug


--
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug

Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Roy Lim

I am not using edismax (eventually I would like to get there) but I'm just
testing with standard query right now.  Original posting:

I'm trying to figure out why the multi-word synonym expansion is not
working correctly (or, at least what I'm misunderstanding).  Specifically,
when I test a standard query with Solr Admin it appears to still split on
whitespace.

Here is my setup:
- Solr 7.2.1
- synonym example: LCD => liquid crystal display
- q=myfield:LCD
- added parameter: sow=false
- myfield schema looks like (analyzer both applicable to index and query
time):


  


...


When debugging the query, Solr Admin shows the parsed query as:

myfield:liquid myfield:crystal myfield:display


(default operator being OR), as you can see it would incorrectly match on
any of those words, but not all, which is what I would expect...

Should it not do a phrase query search for the exact translated synonym,
"liquid crystal display"?



On Wed, Aug 15, 2018 at 5:01 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Also share your fieldType settings for myfield as well from your schema
> On Wed, Aug 15, 2018 at 8:00 PM Doug Turnbull <
> dturnb...@opensourceconnections.com> wrote:
>
> > Aside from the screenshot issue, one  thing to check: are you searching
> > with defType=edismax ?
> >
> > As in
> > q=lcd=myfield=false=edismax
> >
> > ?
> >
> > Also sow=false should the the default on Solr 7 and above
> >
> > Doug
> >
> > On Wed, Aug 15, 2018 at 6:27 PM Roy Lim  wrote:
> >
> >> I'm trying to figure out why the multi-word synonym expansion is not
> >> working
> >> correctly.  Specifically, when I test a standard query with Solr Admin
> it
> >> is
> >> still splitting on whitespace.
> >>
> >> Here is my setup:
> >> - Solr 7.2.1
> >> - synonym LCD => liquid crystal display
> >> - q=myfield:LCD
> >> - added: sow=false
> >> - myfield looks like:
> >>
> >>
> >> Solr Admin shows the parsed query looks like:
> >>
> >> myfield:liquid myfield:crystal myfield:display
> >>
> >> (default operator being OR), which would incorrectly match documents
> with
> >> any of those words, but not all, which is what I would expect...
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >>
> > --
> > CTO, OpenSource Connections
> > Author, Relevant Search
> > http://o19s.com/doug
> >
> --
> CTO, OpenSource Connections
> Author, Relevant Search
> http://o19s.com/doug
>

Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Doug Turnbull

Also share your fieldType settings for myfield as well from your schema
On Wed, Aug 15, 2018 at 8:00 PM Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Aside from the screenshot issue, one  thing to check: are you searching
> with defType=edismax ?
>
> As in
> q=lcd=myfield=false=edismax
>
> ?
>
> Also sow=false should the the default on Solr 7 and above
>
> Doug
>
> On Wed, Aug 15, 2018 at 6:27 PM Roy Lim  wrote:
>
>> I'm trying to figure out why the multi-word synonym expansion is not
>> working
>> correctly.  Specifically, when I test a standard query with Solr Admin it
>> is
>> still splitting on whitespace.
>>
>> Here is my setup:
>> - Solr 7.2.1
>> - synonym LCD => liquid crystal display
>> - q=myfield:LCD
>> - added: sow=false
>> - myfield looks like:
>>
>>
>> Solr Admin shows the parsed query looks like:
>>
>> myfield:liquid myfield:crystal myfield:display
>>
>> (default operator being OR), which would incorrectly match documents with
>> any of those words, but not all, which is what I would expect...
>>
>>
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
> --
> CTO, OpenSource Connections
> Author, Relevant Search
> http://o19s.com/doug
>
-- 
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug

Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Doug Turnbull

Aside from the screenshot issue, one  thing to check: are you searching
with defType=edismax ?

As in
q=lcd=myfield=false=edismax

?

Also sow=false should the the default on Solr 7 and above

Doug

On Wed, Aug 15, 2018 at 6:27 PM Roy Lim  wrote:

> I'm trying to figure out why the multi-word synonym expansion is not
> working
> correctly.  Specifically, when I test a standard query with Solr Admin it
> is
> still splitting on whitespace.
>
> Here is my setup:
> - Solr 7.2.1
> - synonym LCD => liquid crystal display
> - q=myfield:LCD
> - added: sow=false
> - myfield looks like:
>
>
> Solr Admin shows the parsed query looks like:
>
> myfield:liquid myfield:crystal myfield:display
>
> (default operator being OR), which would incorrectly match documents with
> any of those words, but not all, which is what I would expect...
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
-- 
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug

Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Steve Rowe

Yes please.  That way we’ll see the whole thing.

--
Steve
www.lucidworks.com

> On Aug 15, 2018, at 7:20 PM, Roy Lim  wrote:
> 
> I've subscribed, shall I re-post it then via email?
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Roy Lim

I've subscribed, shall I re-post it then via email?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Steve Rowe

Roy,

Not sure of the point of Nabble when it strips content before passing messages 
on to the mailing list.  I’ve emailed them about this problem in the past but 
they have done nothing about it.

Updating a post on Nabble will never make it to the mailing list.  If you want 
us to be able to read your post in full, you should subscribe to the mailing 
list instead of using Nabble.  Instructions here: 
http://lucene.apache.org/solr/community.html#solr-user-list-solr-userluceneapacheorg

--
Steve
www.lucidworks.com

> On Aug 15, 2018, at 7:00 PM, Roy Lim  wrote:
> 
> Thanks, updated original post.  It just removed what I surrounded with the
> raw text markup, I've added it back without markup.  Not sure of the point
> of raw text if it's always removed 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Roy Lim

Thanks, updated original post.  It just removed what I surrounded with the
raw text markup, I've added it back without markup.  Not sure of the point
of raw text if it's always removed 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Erick Erickson

The mail server strips pretty much all screenshots and attachments, so
I think some of the data you're trying to provide is missing from the
e-mail.

Best,
Erick

On Wed, Aug 15, 2018 at 3:27 PM, Roy Lim  wrote:
> I'm trying to figure out why the multi-word synonym expansion is not working
> correctly.  Specifically, when I test a standard query with Solr Admin it is
> still splitting on whitespace.
>
> Here is my setup:
> - Solr 7.2.1
> - synonym LCD => liquid crystal display
> - q=myfield:LCD
> - added: sow=false
> - myfield looks like:
>
>
> Solr Admin shows the parsed query looks like:
>
> myfield:liquid myfield:crystal myfield:display
>
> (default operator being OR), which would incorrectly match documents with
> any of those words, but not all, which is what I would expect...
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Roy Lim

I'm trying to figure out why the multi-word synonym expansion is not working
correctly.  Specifically, when I test a standard query with Solr Admin it is
still splitting on whitespace.

Here is my setup:
- Solr 7.2.1
- synonym LCD => liquid crystal display
- q=myfield:LCD
- added: sow=false
- myfield looks like:


Solr Admin shows the parsed query looks like:

myfield:liquid myfield:crystal myfield:display

(default operator being OR), which would incorrectly match documents with
any of those words, but not all, which is what I would expect...





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Unable To Delete Managed Synonyms Containing a "/" In Solr 7.2

2018-08-02 Thread Kyle Hipke

I THINK this might be a bug? I've had troubles with how the Solr Managed
Synonym endpoint handles URL encoding of synonyms. It seems to be
impossible to delete a synonym which has a forward slash in it.

I have a synonym with a key of "Hot/Cold Pack" (that's the key that shows
up when I GET the managed synonyms, as it appears in the JSON response).

I've tried DELETE on several URLs, none of which work. Here's the sorts of
URLs I've tried:

1. /synonyms/english/Hot%2FCold%20Pack - returns "Illegal character in
path at index 84:

http://10.74.222.14:8983/solr/ca_gm_search/schema/analysis/synonyms/english/Hot/Cold
Pack"
2. /synonyms/english/Hot%252FCold%20Pack - returns "Illegal character in
path at index 86:

http://10.74.222.14:8983/solr/ca_gm_search/schema/analysis/synonyms/english/Hot%2FCold
Pack"
3. /synonyms/english/Hot%252FCold%2520Pack - returns "
No REST managed resource registered for path
/schema/analysis/synonyms/english/Hot/Cold Pack"

My blind guess is that Solr Managed Synonym endpoint is not properly
decoding the request path. Either it stops decoding at %2F and complains
because no synonym matches "Hot%2FCold Pack", or it decodes the term to
"Hot/Cold Pack" and fails because it interprets "Hot" as a separate request
path node.

Should this be filed in the issue tracker or am I missing something? There
doesn't appear to be a workaround for this. Once you insert a synonym with
a forward slash, it's stuck for good (can't delete the endpoint and
re-create it because it's not allowed if it is in use, and there is no bulk
delete method).
--
*Kyle Hipke*
Software Engineer, Search and CMS Practices

*CIRRUS**10*
C: 206 316 9118
Cal: https://goo.gl/HwHA7K

Website
<http://www.cirrus10.com/?utm_source=signature_medium=email_content=website_link_campaign=Cirrus10_sig>
| LinkedIn <http://www.linkedin.com/company/cirrus10> | Email

Re: synonyms question

2018-07-18 Thread ennio

Vicenzo,

Thank you for the tip. I restarted Solr and it worked.

-Ennio



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: synonyms question

2018-07-17 Thread Vincenzo D'Amore

Have you reloaded the core (or restarted Solr) after the change in the synonyms 
file?

Ciao,
Vincenzo

--
mobile: 3498513251
skype: free.dev

> On 17 Jul 2018, at 20:04, ennio  wrote:
> 
> No not using SolrCloud.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: synonyms question

2018-07-17 Thread ennio

No not using SolrCloud.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: synonyms question

2018-07-17 Thread Vincenzo D'Amore

Ennio, do you know if you have SolrCloud?

On Tue, Jul 17, 2018 at 7:19 PM ennio  wrote:

> Erick,
>
> I'm invoking the synonym at query time.
>
> Here is my fieldType definition.
>
>  positionIncrementGap="100">
>   
> 
>
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
> 
> 
>  protected="protwords.txt"/>
> 
>   
>   
> 
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
> 
> 
>  protected="protwords.txt"/>
> 
>   
> 
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Vincenzo D'Amore

Re: synonyms question

2018-07-17 Thread ennio

Erick,

I'm invoking the synonym at query time.

Here is my fieldType definition.


  







  
  







  





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: synonyms question

2018-07-17 Thread Andrea Gazzarini


Hi Ennio,
could you please share:

 * your configuration (specifically the field type declaration in your
   schema)
 * the query (please add debug=true) and the corresponding query response

Best,
Andrea

On 17/07/18 17:35, Ennio Bozzetti wrote:

I'm trying to get my synonyms to work, but for this one keyword I cannot get it 
to work.

I added the following to my synonyms file.

fiber,fibre

But when I search for fiber or fibre it does not work.

Fiber is the American English spelling and Fibre is the British English 
spelling.

My field type is set to text_en would that be why?

Thanks,

Ennio Bozzetti
Senior Web Programmer
THORLABS
(973) 300-2561
www.thorlabs.com<https://www.thorlabs.com/>

Re: synonyms question

2018-07-17 Thread Erick Erickson

You have to look at the analysis chain for text_en. Is the synonym
factory being invoked? If so at indexing time or query time? If
indexing time, did you have the synonym defined when you indexed the
data originally? If in cloud mode did you push the configs to
Zookeeper and reload the collection before indexing and/or querying?

The admin UI>>collection>>analysis page is very helpful.

Best,
Erick

On Tue, Jul 17, 2018 at 8:35 AM, Ennio Bozzetti  wrote:
> I'm trying to get my synonyms to work, but for this one keyword I cannot get 
> it to work.
>
> I added the following to my synonyms file.
>
> fiber,fibre
>
> But when I search for fiber or fibre it does not work.
>
> Fiber is the American English spelling and Fibre is the British English 
> spelling.
>
> My field type is set to text_en would that be why?
>
> Thanks,
>
> Ennio Bozzetti
> Senior Web Programmer
> THORLABS
> (973) 300-2561
> www.thorlabs.com<https://www.thorlabs.com/>
>

synonyms question

2018-07-17 Thread Ennio Bozzetti

I'm trying to get my synonyms to work, but for this one keyword I cannot get it 
to work.

I added the following to my synonyms file.

fiber,fibre

But when I search for fiber or fibre it does not work.

Fiber is the American English spelling and Fibre is the British English 
spelling.

My field type is set to text_en would that be why?

Thanks,

Ennio Bozzetti
Senior Web Programmer
THORLABS
(973) 300-2561
www.thorlabs.com<https://www.thorlabs.com/>

Re: Problem with synonyms containing whitespace

2018-05-09 Thread srujan.kommoju

thanks for the solution its working fine for me.
I did the same configuration but missed the
tokenizerFactory="solr.KeywordTokenizerFactory" in the filter tag. that
great



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

2018-04-26 Thread David Smiley

Yay!  I'm glad the UnifiedHighlighter is serving you well.  I was about to
suggest it.  If you think the fragmentation/snippeting could be improved in
a general way then post a JIRA for consideration.  Note: identical results
with the original Highlighter is a non-goal.

On Mon, Apr 23, 2018 at 10:14 PM howed <david.h...@auspost.com.au> wrote:

> Finally got back to looking at this, and found that the solution was to
> switch to the  unified
> <
> https://lucene.apache.org/solr/guide/7_2/highlighting.html#choosing-a-highlighter>
>
> highlighter which doesn't seem to have the same problem with my complex
> synonyms.  This required some tweaking of the highlighting parameters and
> my
> code as it doesn't highlight exactly the same as the default highlighter,
> but all is working now.
>
> Thanks again for the assistance.
>
> David
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

RE: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

2018-04-23 Thread howed

Finally got back to looking at this, and found that the solution was to
switch to the  unified
<https://lucene.apache.org/solr/guide/7_2/highlighting.html#choosing-a-highlighter>
  
highlighter which doesn't seem to have the same problem with my complex
synonyms.  This required some tweaking of the highlighting parameters and my
code as it doesn't highlight exactly the same as the default highlighter,
but all is working now.

Thanks again for the assistance.

David



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: PF, PF2, PF3 clauses missing in solr7 with query-time synonyms?

2018-04-19 Thread Elizabeth Haubert

An update on this:

The problem occurs on phrase queries, using edismax, where the term in the
nested query contains a multi-word synonym.
In the example above,  dog has a multiterm synonym "canis familiaris", and
aspirin has "acetylsalicylic acid".

Creating a JIRA ticket.

Thank you,
Elizabeth


On Wed, Apr 18, 2018 at 12:38 PM, Elizabeth Haubert <
ehaub...@opensourceconnections.com> wrote:

> I'm seeing pf and pf3 clauses fail to generate in long queries containing
> synonyms.  Wondering if anyone else has run into this, or if it needs to be
> submitted as a bug in Jira.   It is a showstopper problem for the current
> project, as the pf and pf3 were pretty heavily tuned.
>
> Using Solr 7.1; all fields are using the following type:
>
> With query-time synonyms:
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
> 
>  pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
> stemEnglishPossessive="1"  protected="protwords_wdff.txt"/>
>  words="stopwords.txt" />
> 
> 
> 
> 
>  protected="protwords_nostem.txt"/>
> 
> 
> 
>   
> 
>  pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"
> stemEnglishPossessive="1"  protected="protwords_wdff.txt"/>
>  words="stopwords.txt" />
> 
> 
> 
> 
>   managed="synonyms_all" />
>  protected="protwords_nostem.txt"/>
> 
> 
> 
> 
>
> Without query-time synonyms:
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
> 
>  pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
> stemEnglishPossessive="1"  protected="protwords_wdff.txt"/>
>  words="stopwords.txt" />
> 
> 
> 
> 
>   managed="synonyms_all" />
>  protected="protwords_nostem.txt"/>
> 
> 
> 
>   
> 
>  pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"
> stemEnglishPossessive="1"  protected="protwords_wdff.txt"/>
>  words="stopwords.txt" />
> 
> 
> 
> 
>  protected="protwords_nostem.txt"/>
> 
> 
> 
> 
>
> Synonyms file is pretty long, so I'll just include the relevent bits for
> an example:
>
> allergic, hypersensitive
> aspirin, acetylsalicylic acid
> dog, canine, canis familiris, k 9
> rat, rattus
>
>
> The problem seems to occur when part of the query has a synonym, but the
> whole phrase is not.  Whitespace added to piece out what is going on;
> believe any parentheses errors are due to my tinkering around.  Beyond that
> though, this is as from Solr.  Slop has been tinkered with to identify
> PF/PF2/PF3 clauses where PF fields have a slop ending in 0, pf2 ending in
> 1, pf3 ending in 2 eg ~10, ~11, ~12, etc.
>
> =
> Example 1:  "aspirin dose in rats"
> ==
>
> With query-time synonyms:
> ===
> /// Q terms generate as expected ///
> +kw1:\"acetylsalicylic acid\" kw1:aspirin)^100.0 |
> (species:\"acetylsalicylic acid\" species:aspirin) |
> (keywords_bm25_no_norms:\"acetylsalicylic acid\" 
> keywords_bm25_no_norms:aspirin)^50.0
> | (description:\"acetylsalicylic acid\" description:aspirin) |
> (kw1ranked:\"acetylsalicylic acid\" kw1ranked:aspirin)^100.0 |
> (text:\"acetylsalicylic acid\" text:aspirin) | (title:\"acetylsalicylic
> acid\" title:aspirin)^100.0 | (keywordsranked_bm25_no_norms:\"acetylsalicylic
> acid\" keywordsranked_bm25_no_norms:aspirin)^50.0 |
> (authors:\"acetylsalicylic acid\" authors:aspirin))~0.4
> ((Synonym(kw1:dosage kw1:dose kw1:dose kw1:dose))^100.0 |
> Synonym(species:d

PF, PF2, PF3 clauses missing in solr7 with query-time synonyms?

2018-04-18 Thread Elizabeth Haubert

I'm seeing pf and pf3 clauses fail to generate in long queries containing
synonyms.  Wondering if anyone else has run into this, or if it needs to be
submitted as a bug in Jira.   It is a showstopper problem for the current
project, as the pf and pf3 were pretty heavily tuned.

Using Solr 7.1; all fields are using the following type:

With query-time synonyms:














  
















Without query-time synonyms:















  















Synonyms file is pretty long, so I'll just include the relevent bits for an
example:

allergic, hypersensitive
aspirin, acetylsalicylic acid
dog, canine, canis familiris, k 9
rat, rattus


The problem seems to occur when part of the query has a synonym, but the
whole phrase is not.  Whitespace added to piece out what is going on;
believe any parentheses errors are due to my tinkering around.  Beyond that
though, this is as from Solr.  Slop has been tinkered with to identify
PF/PF2/PF3 clauses where PF fields have a slop ending in 0, pf2 ending in
1, pf3 ending in 2 eg ~10, ~11, ~12, etc.

=
Example 1:  "aspirin dose in rats"
==

With query-time synonyms:
===
/// Q terms generate as expected ///
+kw1:\"acetylsalicylic acid\" kw1:aspirin)^100.0 |
(species:\"acetylsalicylic acid\" species:aspirin) |
(keywords_bm25_no_norms:\"acetylsalicylic acid\"
keywords_bm25_no_norms:aspirin)^50.0 | (description:\"acetylsalicylic
acid\" description:aspirin) | (kw1ranked:\"acetylsalicylic acid\"
kw1ranked:aspirin)^100.0 | (text:\"acetylsalicylic acid\" text:aspirin) |
(title:\"acetylsalicylic acid\" title:aspirin)^100.0 |
(keywordsranked_bm25_no_norms:\"acetylsalicylic acid\"
keywordsranked_bm25_no_norms:aspirin)^50.0 | (authors:\"acetylsalicylic
acid\" authors:aspirin))~0.4 ((Synonym(kw1:dosage kw1:dose kw1:dose
kw1:dose))^100.0 | Synonym(species:dosage species:dose species:dose
species:dose) | (Synonym(keywords_bm25_no_norms:dosage
keywords_bm25_no_norms:dose keywords_bm25_no_norms:dose
keywords_bm25_no_norms:dose))^50.0 | Synonym(description:dosage
description:dose description:dose description:dose) |
(Synonym(kw1ranked:dosage kw1ranked:dose kw1ranked:dose
kw1ranked:dose))^100.0 | Synonym(text:dosage text:dose text:dose text:dose)
| (Synonym(title:dosage title:dose title:dose title:dose))^100.0 |
(Synonym(keywordsranked_bm25_no_norms:dosage
keywordsranked_bm25_no_norms:dose keywordsranked_bm25_no_norms:dose
keywordsranked_bm25_no_norms:dose))^50.0 | Synonym(authors:dosage
authors:dose authors:dose authors:dose))~0.4 ((Synonym(kw1:rat
kw1:rattu))^100.0 | Synonym(species:rat species:rattu) |
(Synonym(keywords_bm25_no_norms:rat keywords_bm25_no_norms:rattu))^50.0 |
Synonym(description:rat description:rattu) | (Synonym(kw1ranked:rat
kw1ranked:rattu))^100.0 | Synonym(text:rat text:rattu) | (Synonym(title:rat
title:rattu))^100.0 | (Synonym(keywordsranked_bm25_no_norms:rat
keywordsranked_bm25_no_norms:rattu))^50.0 | Synonym(authors:rat
authors:rattu))~0.4)~3)

/// PF and PF2 are missing. ///
 () () () () ()

/// This is actually PF3 with a missing ? where the stopword 'in' belonged.
///
 ((title:\"(dosage dose dose dose) (rattu rat)\"~22)^1000.0 |
(keywordsranked_bm25_no_norms:\"(dosage dose dose dose) (rattu
rat)\"~22)^1000.0 | (text:\"(dosage dose dose dose) (rattu
rat)\"~22)^100.0)~0.4 ((keywords_bm25_no_norms:\"(dosage dose dose dose)
(rattu rat)\"~12)^500.0 | (kw1ranked:\"(dosage dose dose dose) (rattu
rat)\"~12)^100.0 | (kw1:\"(dosage dose dose dose) (rattu
rat)\"~12)^100.0)~0.4,product(max(10.0/(3.16E-11*float(ms(const(14560),date(dateint)))+6.0),int(documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))",

With index-time synonyms:
===

/// Q ///
 "boost(+kw1:aspirin)^100.0 | species:aspirin |
(keywords_bm25_no_norms:aspirin)^50.0 | description:aspirin |
(kw1ranked:aspirin)^100.0 | text:aspirin | (title:aspirin)^100.0 |
(keywordsranked_bm25_no_norms:aspirin)^50.0 | authors:aspirin)~0.4
((kw1:dose)^100.0 | species:dose | (keywords_bm25_no_norms:dose)^50.0 |
description:dose | (kw1ranked:dose)^100.0 | text:dose | (title:dose)^100.0
| (keywordsranked_bm25_no_norms:dose)^50.0 | authors:dose)~0.4
((kw1:rats)^100.0 | species:rats | (keywords_bm25_no_norms:rats)^50.0 |
description:rats | (kw1ranked:rats)^100.0 | text:rats | (title:rats)^100.0
| (keywordsranked_bm25_no_norms:rats)^50.0 | authors:rats)~0.4)~3)
/// PF  ///
  ((title:\"aspirin dose ? rats\"~20)^5000.0 |
(keywordsranked_bm25_no_norms:\"aspirin dose ? rats\"~20)^5000.0 |
(keywords_bm25_no_norms:\"aspirin dose ? rats\"~20)^1500.0 |
(text:\"aspirin dose ? rats\"~20)^1000.0)~0.4 ((kw1ranked:\"aspirin dose ?
rats\"~10)^5000.0 | (kw1:\"aspirin dose ? rats\"~10)^500.0)~0.4
((authors:\"aspirin dose

RE: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

2018-03-08 Thread Rick Leir

David
Yes, highlighting is tricky, especially with synonyms. Sorry, I would need to 
see a bit more of your config before saying more about it.
Thanks -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

RE: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

2018-03-08 Thread Howe, David


Hi Rick,

Thanks for your response.  The reason that we do it like this is that the 
localities are also part of another indexed field that contains the entire 
address.  We actually do the search over that field, and we are only using the 
highlighting on the problematic field so that we can tell which parts of the 
address that we matched to.  We never search for wildcards like "*cannum*".

As an example, we might have an address that we index which is "19 some st 
cannum vic 3456".  When we index the address, we actually index the text "19 
some st lcx__balmoral__cannum__clear_lake__lower_norton vic 3456" into a Solr 
field that has our custom synonym filter.  This then causes the synonyms for 
the locality "cannum" to be generated, and if we search for "19 some st 
balmoral" we will still get a match on the locality component of the address.  
Using this method, the searching for addresses is working fine.

We have a requirement once we have a match to know which part of the address 
that we matched to, which is where the highlighting comes in.  By loading just 
the locality part of the address into a separate field and applying the same 
synonym filter, through the highlighting we can see if we get a hit on the 
locality.  We do this with the other components of the address, like the 
number, the street name, the street type, the post code etc. so that we can 
return to the caller what bits of their input matched to the address we are 
returning.

I could load them as a multi-valued field for just the highlighting, but that 
means I need to extract them in a different format to what I am using for the 
whole address which I would like to avoid if possible.  We are loading these 
addresses from a database table using the data import handler.

Regards,

David

David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000

T  0391067904

M  0424036591

E  david.h...@auspost.com.au

W  auspost.com.au
W  startrack.com.au

Australia Post is committed to providing our customers with excellent service. 
If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, 
confidential or legally professionally privileged. It is intended exclusively 
for the individual or entity to which it is addressed. You should only read, 
disclose, re-transmit, copy, distribute, act in reliance on or commercialise 
the information if you are authorised to do so. Australia Post does not 
represent, warrant or guarantee that the integrity of this email communication 
has been maintained nor that the communication is free of errors, virus or 
interference.

If you are not the addressee or intended recipient please notify us by replying 
direct to the sender and then destroy any electronic or paper copy of this 
message. Any views expressed in this email communication are taken to be those 
of the individual sender, except where the sender specifically attributes those 
views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.

Re: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

2018-03-08 Thread Rick Leir

David
When you have "lcx__balmoral__cannum__clear_lake__lower_norton" in a field, 
would you search for *cannum* ? That might not perform well. 
Why not have a multivalue field for this information? 

It could be that you have a good reason for this, and I just do not understand.
Cheers -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Using dynamic synonyms file

2018-02-14 Thread Roopa Rao

Hi,

Is it possible to specify the synonyms file as a variable, set a default
synonym file and passing the file name from the request? If so, is there an
example of this?

Such as,



Thanks,
Roopa

Re: Using Synonyms as a feature with LTR

2018-02-14 Thread Roopa Rao

I see okay, thank you.

On Wed, Feb 14, 2018 at 10:34 AM, Alessandro Benedetti  wrote:

> I see,
> According to what I know it is not possible to run for the same field
> different query time analysis.
>
> Not sure if anyone was working on that.
>
> Regards
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Using Synonyms as a feature with LTR

2018-02-14 Thread Alessandro Benedetti

I see,
According to what I know it is not possible to run for the same field
different query time analysis.

Not sure if anyone was working on that.

Regards



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Using Synonyms as a feature with LTR

2018-02-14 Thread Roopa Rao

So, I would end up with ~6 copy fields with ~8 synonym files so that would
be about 48 field/synonym combination. Would that be a significant in terms
of index size. What would be the best way to measure this?

Custom parser:
This would take the file name, field to run the analysis on. This field
need not be a copy field which holds data, since we can use this is only
for getting the analysis.
Get the synonyms for the user query as tokens.
Create a edismax query based on the query tokens.
Return the score

This custom parser would be called in LTR as a scalar feature.

I am at the stage I can get the synonyms from the analysis chain, however
tokens are individual tokens and not phrases. So, I am stuck at how to
construct a correct query based on the synonym tokens and positions.

Thank you,
Roopa

On Wed, Feb 14, 2018 at 10:12 AM, Roopa Rao <roop...@gmail.com> wrote:

> So, I would end up with ~6 copy fields with ~8 synonym files so that would
> be about 48 field/synonym combination. Would that be a significant in terms
> of index size. I guess that depends on the thesaurus size, what would be
> the best way to measure this?
>
> Custom parser:
> This would take the file name, field to run the analysis on. This field
> need not be a copy field which holds data, since we can use this is only
> for getting the analysis.
> Get the synonyms for the user query as tokens.
> Create a edismax query based on the query tokens.
> Return the score
>
> This custom parser would be called in LTR as a scalar feature.
>
> I am at the stage I can get the synonyms from the analysis chain, however
> tokens are individual tokens and not phrases. So, I am stuck at how to
> construct a correct query based on the synonym tokens and positions.
>
> Thank you,
> Roopa
>
>
>
> On Wed, Feb 14, 2018 at 5:23 AM, Alessandro Benedetti <
> a.benede...@sease.io> wrote:
>
>> "I can go with the "title" field and have that include the synonyms in
>> analysis. Only problem is that the number of fields and number of synonyms
>> files are quite a lot (~ 8 synonyms files) due to different weightage and
>> type of expansion (exact vs partial) based on these. Hence going with this
>> approach would mean creating more fields for all these synonyms
>> (synonyms.txt)
>>
>> So, I am looking to build a custom parser for which I could supply the
>> file
>> and the field and that would expand the synonyms and return a score. "
>>
>> Having a binary or scalar feature is completely up to you and the way you
>> configure the Solr feature.
>> If you have 8 (copy?)fields with same content but different expansion,
>> that
>> is still ok.
>> You can have 8 features, one per type of expansion.
>> LTR will take care of the weight to be assigned to those features.
>>
>> "So, I am looking to build a custom parser for which I could supply the
>> file
>> and the field and that would expand the synonyms and return a score. ""
>> I don't get this , can you elaborate ?
>>
>> Regards
>>
>>
>>
>> -
>> ---
>> Alessandro Benedetti
>> Search Consultant, R Software Engineer, Director
>> Sease Ltd. - www.sease.io
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>
>

Re: Using Synonyms as a feature with LTR

2018-02-14 Thread Roopa Rao

So, I would end up with ~6 copy fields with ~8 synonym files so that would
be about 48 field/synonym combination. Would that be a significant in terms
of index size. I guess that depends on the thesaurus size, what would be
the best way to measure this?

Custom parser:
This would take the file name, field to run the analysis on. This field
need not be a copy field which holds data, since we can use this is only
for getting the analysis.
Get the synonyms for the user query as tokens.
Create a edismax query based on the query tokens.
Return the score

This custom parser would be called in LTR as a scalar feature.

I am at the stage I can get the synonyms from the analysis chain, however
tokens are individual tokens and not phrases. So, I am stuck at how to
construct a correct query based on the synonym tokens and positions.

Thank you,
Roopa

On Wed, Feb 14, 2018 at 5:23 AM, Alessandro Benedetti <a.benede...@sease.io>
wrote:

> "I can go with the "title" field and have that include the synonyms in
> analysis. Only problem is that the number of fields and number of synonyms
> files are quite a lot (~ 8 synonyms files) due to different weightage and
> type of expansion (exact vs partial) based on these. Hence going with this
> approach would mean creating more fields for all these synonyms
> (synonyms.txt)
>
> So, I am looking to build a custom parser for which I could supply the file
> and the field and that would expand the synonyms and return a score. "
>
> Having a binary or scalar feature is completely up to you and the way you
> configure the Solr feature.
> If you have 8 (copy?)fields with same content but different expansion, that
> is still ok.
> You can have 8 features, one per type of expansion.
> LTR will take care of the weight to be assigned to those features.
>
> "So, I am looking to build a custom parser for which I could supply the
> file
> and the field and that would expand the synonyms and return a score. ""
> I don't get this , can you elaborate ?
>
> Regards
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Using Synonyms as a feature with LTR

2018-02-14 Thread Alessandro Benedetti

"I can go with the "title" field and have that include the synonyms in 
analysis. Only problem is that the number of fields and number of synonyms 
files are quite a lot (~ 8 synonyms files) due to different weightage and 
type of expansion (exact vs partial) based on these. Hence going with this 
approach would mean creating more fields for all these synonyms 
(synonyms.txt) 

So, I am looking to build a custom parser for which I could supply the file 
and the field and that would expand the synonyms and return a score. "

Having a binary or scalar feature is completely up to you and the way you
configure the Solr feature.
If you have 8 (copy?)fields with same content but different expansion, that
is still ok.
You can have 8 features, one per type of expansion.
LTR will take care of the weight to be assigned to those features.

"So, I am looking to build a custom parser for which I could supply the file 
and the field and that would expand the synonyms and return a score. ""
I don't get this , can you elaborate ?

Regards



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Using Synonyms as a feature with LTR

2018-02-13 Thread Roopa Rao

Thank you, Alessandro,

I was trying these options before replying.

Yes, I am looking to generate a score for a query with synonym expansion
(not binary feature)

I can go with the "title" field and have that include the synonyms in
analysis. Only problem is that the number of fields and number of synonyms
files are quite a lot (~ 8 synonyms files) due to different weightage and
type of expansion (exact vs partial) based on these. Hence going with this
approach would mean creating more fields for all these synonyms
(synonyms.txt)

So, I am looking to build a custom parser for which I could supply the file
and the field and that would expand the synonyms and return a score.

Thanks,
Roopa

On Mon, Feb 12, 2018 at 6:23 AM, Alessandro Benedetti <a.benede...@sease.io>
wrote:

> In the end a feature will just be a numerical value.
> How do you plan to use synonyms in a field to generate a numerical feature
> ?
>
> Are you planning to define a binary feature for a field, in case there is a
> match on the synonyms ?
> Or a feature which contains a score for a query ( with synonyms expansion)
> ?
>
> I would start from the SolrFeature, let's assume the "title" field has a
> field type that includes synonyms ( query time) :
>
> {
> "store" : "featureStore",
> "name" : "hasTitleMatch",
> "class" : "org.apache.solr.ltr.feature.SolrFeature",
> "params" : {
>   "fq": [ "{!field f=title}${query}" ]
> }
>
> Query time analysis will be applied and synonyms expanded.
> So the feature will have a value , which is the score returned for the
> query
> and the document ( under scoring) .
> You can play with that and design the feature that best fit your idea.
>
> Regards
>
>
>
>
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Using Synonyms as a feature with LTR

2018-02-12 Thread Alessandro Benedetti

In the end a feature will just be a numerical value.
How do you plan to use synonyms in a field to generate a numerical feature ?

Are you planning to define a binary feature for a field, in case there is a
match on the synonyms ?
Or a feature which contains a score for a query ( with synonyms expansion) ?

I would start from the SolrFeature, let's assume the "title" field has a
field type that includes synonyms ( query time) :

{
"store" : "featureStore",
"name" : "hasTitleMatch",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : {
  "fq": [ "{!field f=title}${query}" ]
}

Query time analysis will be applied and synonyms expanded.
So the feature will have a value , which is the score returned for the query
and the document ( under scoring) .
You can play with that and design the feature that best fit your idea.

Regards








-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Multi words query time synonyms

2018-02-11 Thread Dominique Bejean

Steve,

According to your comment, I made this test :

1/ put the SynonymGraphFilterFactory after the StopFilterFactory in query
time analyze chain


  
  
  
  
  
  
  


2/ remove the stop word in the synonyms file

om, olympique marseille


The parsed query string are :

for "om maillot"
"parsedquery_toString":"+(+name_text_gp:olympiqu +name_text_gp:marseil)
name_text_gp:om)) (name_text_gp:maillot))~1)",

for "olympique de marseille maillot"
"parsedquery_toString":"+name_text_gp:om (+name_text_gp:olympiqu
+name_text_gp:marseil))) (name_text_gp:maillot))~1)",

for "maillot om"
parsedquery_toString":"+(((name_text_gp:maillot) (((+name_text_gp:olympiqu
+name_text_gp:marseil) name_text_gp:om)))~1)",

for "maillot olympique de marseille"
 "parsedquery_toString":"+(((name_text_gp:maillot) ((name_text_gp:om
(+name_text_gp:olympiqu +name_text_gp:marseil~1)",


The query result are the same for all queries.

It looks like this could be an acceptable workaround.

Thank you

Dominique



Le dim. 11 févr. 2018 à 10:31, Dominique Bejean <dominique.bej...@eolya.fr>
a écrit :

> Hi Steve,
>
> Thank you for your response.
> The Jira was created : SOLR-11968
>
> I let you add your comments.
>
> Regards.
>
> Dominique
>
>
> Le sam. 10 févr. 2018 à 20:30, Steve Rowe <sar...@gmail.com> a écrit :
>
>> Hi Dominique,
>>
>> Looks like it’s a bug, not sure where exactly though.  Can you please
>> create a JIRA?
>>
>> I can see the same behavior on master too, not just on the
>> releases/lucene-solr/6.6.2 tag.
>>
>> One interesting thing I found is that if I remove the stop filter from
>> the query analyzer, I get the following for qq=“maillot om”:
>>
>> +((name_text_gp:maillot) (((+name_text_gp:olympiqu +name_text_gp:de
>> +name_text_gp:marseil) name_text_gp:om)))
>>
>> (btw my stop list only has “de” on it)
>>
>> Thanks,
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>> > On Feb 10, 2018, at 2:12 AM, Dominique Bejean <
>> dominique.bej...@eolya.fr> wrote:
>> >
>> > Hi,
>> >
>> > More info.
>> >
>> > When I test the analisys for the field type the synonyms are correctly
>> > expanded for both expressions
>> >
>> > om maillot
>> > maillot om
>> > olympique de marseille maillot
>> > maillot olympique de marseille
>> >
>> > resulting outputs always include the following terms (obvioulsly not
>> always
>> > in the same order)
>> >
>> > olympiqu om marseil maillot
>> >
>> >
>> > So, i suspect an issue with edismax query parser.
>> >
>> > Regards.
>> >
>> > Dominique
>> >
>> >
>> > Le ven. 9 févr. 2018 à 18:25, Dominique Bejean <
>> dominique.bej...@eolya.fr>
>> > a écrit :
>> >
>> >> Hi,
>> >>
>> >> I am trying multi words query time synonyms with Solr 6.6.2and
>> >> SynonymGraphFilterFactory filter as explain in this article
>> >>
>> >>
>> https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
>> >>
>> >> My field type is :
>> >>
>> >> > >> positionIncrementGap="100">
>> >>
>> >>  
>> >>  > >>articles="lang/contractions_fr.txt"/>
>> >>  
>> >>  
>> >>  > >> ignoreCase="true"/>
>> >>  
>> >>
>> >>
>> >>  
>> >>  > >>articles="lang/contractions_fr.txt"/>
>> >>  
>> >>  > >> synonyms="synonyms.txt"
>> >>ignoreCase="true" expand="true"/>
>> >>  
>> >>  > >> ignoreCase="true"/>
>> >>  
>> >>
>> >>  
>> >>
>> >>
>> >> synonyms.txt contains the line
>> >>
>> >> om, olympique de marseille
>> >>
>> >>
>> >> The order of words in my query has an impact on the generated query in
>> >> edismax
>> >>
>> >> q={!edismax qf='name_text_gp' v=$qq}
>> >> =false
>> >> =...
>> >>
>> >> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see
>> the
>> >> synonyms expansion. It is working as expected.
>> >>
>> >> "parsedquery_toString":"+(((+name_text_gp:olympiqu
>> +name_text_gp:marseil
>> >> +name_text_gp:maillot) name_text_gp:om))",
>> >> "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu
>> >> +name_text_gp:marseil +name_text_gp:maillot)))",
>> >>
>> >>
>> >> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see
>> the
>> >> same generated query
>> >>
>> >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>> >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>> >>
>> >> I don't understand these generated queries. The first one looks like
>> the
>> >> synonym expansion is ignored, but the second one shows it is not
>> ignored
>> >> and only the synonym term is used.
>> >>
>> >>
>> >> What is wrong in the way I am doing this ?
>> >>
>> >> Regards
>> >>
>> >> Dominique
>> >>
>> >> --
>> >> Dominique Béjean
>> >> 06 08 46 12 43
>> >>
>> > --
>> > Dominique Béjean
>> > 06 08 46 12 43
>>
>> --
> Dominique Béjean
> 06 08 46 12 43
>
-- 
Dominique Béjean
06 08 46 12 43

Re: Multi words query time synonyms

2018-02-11 Thread Dominique Bejean

Hi Steve,

Thank you for your response.
The Jira was created : SOLR-11968

I let you add your comments.

Regards.

Dominique


Le sam. 10 févr. 2018 à 20:30, Steve Rowe <sar...@gmail.com> a écrit :

> Hi Dominique,
>
> Looks like it’s a bug, not sure where exactly though.  Can you please
> create a JIRA?
>
> I can see the same behavior on master too, not just on the
> releases/lucene-solr/6.6.2 tag.
>
> One interesting thing I found is that if I remove the stop filter from the
> query analyzer, I get the following for qq=“maillot om”:
>
> +((name_text_gp:maillot) (((+name_text_gp:olympiqu +name_text_gp:de
> +name_text_gp:marseil) name_text_gp:om)))
>
> (btw my stop list only has “de” on it)
>
> Thanks,
>
> --
> Steve
> www.lucidworks.com
>
> > On Feb 10, 2018, at 2:12 AM, Dominique Bejean <dominique.bej...@eolya.fr>
> wrote:
> >
> > Hi,
> >
> > More info.
> >
> > When I test the analisys for the field type the synonyms are correctly
> > expanded for both expressions
> >
> > om maillot
> > maillot om
> > olympique de marseille maillot
> > maillot olympique de marseille
> >
> > resulting outputs always include the following terms (obvioulsly not
> always
> > in the same order)
> >
> > olympiqu om marseil maillot
> >
> >
> > So, i suspect an issue with edismax query parser.
> >
> > Regards.
> >
> > Dominique
> >
> >
> > Le ven. 9 févr. 2018 à 18:25, Dominique Bejean <
> dominique.bej...@eolya.fr>
> > a écrit :
> >
> >> Hi,
> >>
> >> I am trying multi words query time synonyms with Solr 6.6.2and
> >> SynonymGraphFilterFactory filter as explain in this article
> >>
> >>
> https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
> >>
> >> My field type is :
> >>
> >>  >> positionIncrementGap="100">
> >>
> >>  
> >>   >>articles="lang/contractions_fr.txt"/>
> >>  
> >>  
> >>   >> ignoreCase="true"/>
> >>  
> >>
> >>
> >>  
> >>   >>articles="lang/contractions_fr.txt"/>
> >>  
> >>   >> synonyms="synonyms.txt"
> >>ignoreCase="true" expand="true"/>
> >>  
> >>   >> ignoreCase="true"/>
> >>  
> >>
> >>  
> >>
> >>
> >> synonyms.txt contains the line
> >>
> >> om, olympique de marseille
> >>
> >>
> >> The order of words in my query has an impact on the generated query in
> >> edismax
> >>
> >> q={!edismax qf='name_text_gp' v=$qq}
> >> =false
> >> =...
> >>
> >> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see
> the
> >> synonyms expansion. It is working as expected.
> >>
> >> "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil
> >> +name_text_gp:maillot) name_text_gp:om))",
> >> "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu
> >> +name_text_gp:marseil +name_text_gp:maillot)))",
> >>
> >>
> >> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see
> the
> >> same generated query
> >>
> >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
> >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
> >>
> >> I don't understand these generated queries. The first one looks like the
> >> synonym expansion is ignored, but the second one shows it is not ignored
> >> and only the synonym term is used.
> >>
> >>
> >> What is wrong in the way I am doing this ?
> >>
> >> Regards
> >>
> >> Dominique
> >>
> >> --
> >> Dominique Béjean
> >> 06 08 46 12 43
> >>
> > --
> > Dominique Béjean
> > 06 08 46 12 43
>
> --
Dominique Béjean
06 08 46 12 43

Re: Multi words query time synonyms

2018-02-10 Thread Steve Rowe

Hi Dominique,

Looks like it’s a bug, not sure where exactly though.  Can you please create a 
JIRA?

I can see the same behavior on master too, not just on the 
releases/lucene-solr/6.6.2 tag.

One interesting thing I found is that if I remove the stop filter from the 
query analyzer, I get the following for qq=“maillot om”:

+((name_text_gp:maillot) (((+name_text_gp:olympiqu +name_text_gp:de 
+name_text_gp:marseil) name_text_gp:om)))

(btw my stop list only has “de” on it)

Thanks,

--
Steve
www.lucidworks.com

> On Feb 10, 2018, at 2:12 AM, Dominique Bejean <dominique.bej...@eolya.fr> 
> wrote:
> 
> Hi,
> 
> More info.
> 
> When I test the analisys for the field type the synonyms are correctly
> expanded for both expressions
> 
> om maillot
> maillot om
> olympique de marseille maillot
> maillot olympique de marseille
> 
> resulting outputs always include the following terms (obvioulsly not always
> in the same order)
> 
> olympiqu om marseil maillot
> 
> 
> So, i suspect an issue with edismax query parser.
> 
> Regards.
> 
> Dominique
> 
> 
> Le ven. 9 févr. 2018 à 18:25, Dominique Bejean <dominique.bej...@eolya.fr>
> a écrit :
> 
>> Hi,
>> 
>> I am trying multi words query time synonyms with Solr 6.6.2and
>> SynonymGraphFilterFactory filter as explain in this article
>> 
>> https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
>> 
>> My field type is :
>> 
>> > positionIncrementGap="100">
>>
>>  
>>  >    articles="lang/contractions_fr.txt"/>
>>  
>>  
>>  > ignoreCase="true"/>
>>  
>>
>>
>>  
>>  >articles="lang/contractions_fr.txt"/>
>>  
>>  > synonyms="synonyms.txt"
>>ignoreCase="true" expand="true"/>
>>  
>>  > ignoreCase="true"/>
>>  
>>
>>  
>> 
>> 
>> synonyms.txt contains the line
>> 
>> om, olympique de marseille
>> 
>> 
>> The order of words in my query has an impact on the generated query in
>> edismax
>> 
>> q={!edismax qf='name_text_gp' v=$qq}
>> =false
>> =...
>> 
>> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the
>> synonyms expansion. It is working as expected.
>> 
>> "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil
>> +name_text_gp:maillot) name_text_gp:om))",
>> "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu
>> +name_text_gp:marseil +name_text_gp:maillot)))",
>> 
>> 
>> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the
>> same generated query
>> 
>> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>> 
>> I don't understand these generated queries. The first one looks like the
>> synonym expansion is ignored, but the second one shows it is not ignored
>> and only the synonym term is used.
>> 
>> 
>> What is wrong in the way I am doing this ?
>> 
>> Regards
>> 
>> Dominique
>> 
>> --
>> Dominique Béjean
>> 06 08 46 12 43
>> 
> -- 
> Dominique Béjean
> 06 08 46 12 43

Re: Multi words query time synonyms

2018-02-10 Thread Dominique Bejean

Hi,

More info.

When I test the analisys for the field type the synonyms are correctly
expanded for both expressions

om maillot
maillot om
olympique de marseille maillot
maillot olympique de marseille

resulting outputs always include the following terms (obvioulsly not always
in the same order)

olympiqu om marseil maillot


So, i suspect an issue with edismax query parser.

Regards.

Dominique


Le ven. 9 févr. 2018 à 18:25, Dominique Bejean <dominique.bej...@eolya.fr>
a écrit :

> Hi,
>
> I am trying multi words query time synonyms with Solr 6.6.2and
> SynonymGraphFilterFactory filter as explain in this article
>
> https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
>
> My field type is :
>
>  positionIncrementGap="100">
> 
>   
>articles="lang/contractions_fr.txt"/>
>   
>   
>ignoreCase="true"/>
>   
> 
> 
>   
>articles="lang/contractions_fr.txt"/>
>   
>synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>   
>ignoreCase="true"/>
>   
> 
>   
>
>
> synonyms.txt contains the line
>
> om, olympique de marseille
>
>
> The order of words in my query has an impact on the generated query in
> edismax
>
> q={!edismax qf='name_text_gp' v=$qq}
> =false
> =...
>
> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the
> synonyms expansion. It is working as expected.
>
> "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil
> +name_text_gp:maillot) name_text_gp:om))",
> "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu
> +name_text_gp:marseil +name_text_gp:maillot)))",
>
>
> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the
> same generated query
>
> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>
> I don't understand these generated queries. The first one looks like the
> synonym expansion is ignored, but the second one shows it is not ignored
> and only the synonym term is used.
>
>
> What is wrong in the way I am doing this ?
>
> Regards
>
> Dominique
>
> --
> Dominique Béjean
> 06 08 46 12 43
>
-- 
Dominique Béjean
06 08 46 12 43

1 2 3 4 5 6 7 8 9 >

1 - 100 of 898 matches

Mail list logo