Re: Multi-word Synonyms not working properly with Edismax

2020-09-08 Thread Manish Bafna
Yes, we tried that and it worked. We removed only for query analyzer and it
is working properly now.


On Wed, Sep 9, 2020 at 2:24 AM Dominique Bejean 
wrote:

> Hi,
>
> Can you try to remove the RemoveDuplicatesTokenFilter ?
>
> Dominique
>
> Le mar. 8 sept. 2020 à 13:52, Manish Bafna  a
> écrit :
>
> > Hi,
> >
> > We are using the following configuration:
> >
> >
> >
> > --
> >
> > *Schema: *
> >
> >  >
> > positionIncrementGap="100"  autoGeneratePhraseQueries="true"
> >
> > omitNorms="true">
> >
> >  
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> >  >
> > dictionary="../hunspell_dictionary/en_US.dic"
> >
> > affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" />
> >
> >  >
> > 
> >
> >  
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> >  >
> > dictionary="../hunspell_dictionary/en_US.dic"
> >
> > affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" />
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > *Managed Synonyms:* "abc implement",  "bike", "xyz traders", "xyz
> > transport"
> >
> > -
> >
> > *Query*: bike
> >
> > *parser Type:* edismax
> >
> > -
> >
> > *Parsed query (from debug)* : +DisjunctionMaxQueryfield1:"abc
> >
> > implement" field1:bike field1:"xyz traders" field1:"xyz trade"))
> >
> > -
> >
> >
> >
> > If you notice, there are 2 multi-word keywords starting with xyz, but
> only
> >
> > 1 of them is getting added to the query. If we change xyz transport to xy
> >
> > transport, then it works properly. The issue is only when the 2
> multi-word
> >
> > keywords start with the same word. Though we are using graph synonyms, it
> >
> > is not working properly.
> >
> >
> >
> > Are we doing anything wrong here?
> >
> >
> >
> > Thanks,
> >
> > Manish.
> >
> >
>


Re: Multi-word Synonyms not working properly with Edismax

2020-09-08 Thread Dominique Bejean
Hi,

Can you try to remove the RemoveDuplicatesTokenFilter ?

Dominique

Le mar. 8 sept. 2020 à 13:52, Manish Bafna  a
écrit :

> Hi,
>
> We are using the following configuration:
>
>
>
> --
>
> *Schema: *
>
> 
> positionIncrementGap="100"  autoGeneratePhraseQueries="true"
>
> omitNorms="true">
>
>  
>
> 
>
> 
>
> 
>
> 
>
> 
> dictionary="../hunspell_dictionary/en_US.dic"
>
> affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" />
>
> 
> 
>
>  
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
> dictionary="../hunspell_dictionary/en_US.dic"
>
> affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" />
>
> 
>
> 
>
> 
>
> 
>
> *Managed Synonyms:* "abc implement",  "bike", "xyz traders", "xyz
> transport"
>
> -
>
> *Query*: bike
>
> *parser Type:* edismax
>
> -
>
> *Parsed query (from debug)* : +DisjunctionMaxQueryfield1:"abc
>
> implement" field1:bike field1:"xyz traders" field1:"xyz trade"))
>
> -
>
>
>
> If you notice, there are 2 multi-word keywords starting with xyz, but only
>
> 1 of them is getting added to the query. If we change xyz transport to xy
>
> transport, then it works properly. The issue is only when the 2 multi-word
>
> keywords start with the same word. Though we are using graph synonyms, it
>
> is not working properly.
>
>
>
> Are we doing anything wrong here?
>
>
>
> Thanks,
>
> Manish.
>
>


Multi-word Synonyms not working properly with Edismax

2020-09-08 Thread Manish Bafna
Hi,
We are using the following configuration:

--
*Schema: *

 






 










*Managed Synonyms:* "abc implement",  "bike", "xyz traders", "xyz transport"
-
*Query*: bike
*parser Type:* edismax
-
*Parsed query (from debug)* : +DisjunctionMaxQueryfield1:"abc
implement" field1:bike field1:"xyz traders" field1:"xyz trade"))
-

If you notice, there are 2 multi-word keywords starting with xyz, but only
1 of them is getting added to the query. If we change xyz transport to xy
transport, then it works properly. The issue is only when the 2 multi-word
keywords start with the same word. Though we are using graph synonyms, it
is not working properly.

Are we doing anything wrong here?

Thanks,
Manish.


Re: Re: Solr edismax parser with multi-word synonyms

2019-07-18 Thread Sunil Srinivasan
Hi Erick, 
Is there anyway I can get it to match documents containing at least one of the 
words of the original query? i.e. 'frozen' or 'dinner' or both. (But not 
partial matches of the synonyms)
Thanks,Sunil


-Original Message-
From: Erick Erickson 
To: solr-user 
Sent: Thu, Jul 18, 2019 04:42 AM
Subject: Re: Solr edismax parser with multi-word synonyms


This is not a phrase query, rather it’s requiring either pair of words
to appear in the title.

You’ve told it that “frozen dinner” and “microwave foods” are synonyms. 
So it’s looking for both the words “microwave” and “foods” in the title field, 
or “frozen” and “dinner” in the title field.

You’d see the same thing with single-word synonyms, albeit a little less
confusingly.


Best,
Erick


> On Jul 18, 2019, at 1:01 AM, kshitij tyagi  
> wrote:
> 
> Hi sunil,
> 
> 1. as you have added "microwave food" in synonym as a multiword synonym to
> "frozen dinner", edismax parsers finds your synonym in the file and is
> considering your query as a Phrase query.
> 
> This is the reason you are seeing parsed query as  +(((+title:microwave
> +title:food) (+title:frozen +title:dinner))), frozen dinner is considered
> as a phrase here.
> 
> If you want partial match on your query then you can add frozen dinner,
> microwave food, microwave, food to your synonym file and you will see the
> parsed query as:
> "+(((+title:microwave +title:food) title:miccrowave title:food
> (+title:frozen +title:dinner)))"
> Another option is to write your own custom query parser and use it as a
> plugin.
> 
> Hope this helps!!
> 
> kshitij
> 
> 
> On Thu, Jul 18, 2019 at 9:14 AM Sunil Srinivasan  wrote:
> 
>> 
>> I have enabled the SynonymGraphFilter in my field configuration in order
>> to support multi-word synonyms (I am using Solr 7.6). Here is my field
>> configuration:
>> 
>>    
>>      
>>    
>> 
>>    
>>      
>>      > synonyms="synonyms.txt"/>
>>    
>> 
>> 
>> 
>> 
>> And this is my synonyms.txt file:
>> frozen dinner,microwave food
>> 
>> Scenario 1: blue shirt (query with no synonyms)
>> 
>> Here is my first Solr query:
>> 
>> http://localhost:8983/solr/base/search?q=blue+shirt=title=edismax=on
>> 
>> And this is the parsed query I see in the debug output:
>> +((title:blue) (title:shirt))
>> 
>> Scenario 2: frozen dinner (query with synonyms)
>> 
>> Now, here is my second Solr query:
>> 
>> http://localhost:8983/solr/base/search?q=frozen+dinner=title=edismax=on
>> 
>> And this is the parsed query I see in the debug output:
>> +(((+title:microwave +title:food) (+title:frozen +title:dinner)))
>> 
>> I am wondering why the first query looks for documents containing at least
>> one of the two query tokens, whereas the second query looks for documents
>> with both of the query tokens? I would understand if it looked for both the
>> tokens of the synonyms (i.e. both microwave and food) to avoid the
>> sausagization problem. But I would like to get partial matches on the
>> original query at least (i.e. it should also match documents containing
>> just the token 'dinner').
>> 
>> Would any one know why the behavior is different across queries with and
>> without synonyms? And how could I work around this if I wanted partial
>> matches on queries that also have synonyms?
>> 
>> Ideally, I would like the parsed query in the second case to be:
>> +(((+title:microwave +title:food) (title:frozen title:dinner)))
>> 
>> I'd appreciate any help with this. Thanks!
>> 


Re: Solr edismax parser with multi-word synonyms

2019-07-18 Thread Erick Erickson
This is not a phrase query, rather it’s requiring either pair of words
to appear in the title.

You’ve told it that “frozen dinner” and “microwave foods” are synonyms. 
So it’s looking for both the words “microwave” and “foods” in the title field, 
or “frozen” and “dinner” in the title field.

You’d see the same thing with single-word synonyms, albeit a little less
confusingly.


Best,
Erick


> On Jul 18, 2019, at 1:01 AM, kshitij tyagi  
> wrote:
> 
> Hi sunil,
> 
> 1. as you have added "microwave food" in synonym as a multiword synonym to
> "frozen dinner", edismax parsers finds your synonym in the file and is
> considering your query as a Phrase query.
> 
> This is the reason you are seeing parsed query as  +(((+title:microwave
> +title:food) (+title:frozen +title:dinner))), frozen dinner is considered
> as a phrase here.
> 
> If you want partial match on your query then you can add frozen dinner,
> microwave food, microwave, food to your synonym file and you will see the
> parsed query as:
> "+(((+title:microwave +title:food) title:miccrowave title:food
> (+title:frozen +title:dinner)))"
> Another option is to write your own custom query parser and use it as a
> plugin.
> 
> Hope this helps!!
> 
> kshitij
> 
> 
> On Thu, Jul 18, 2019 at 9:14 AM Sunil Srinivasan  wrote:
> 
>> 
>> I have enabled the SynonymGraphFilter in my field configuration in order
>> to support multi-word synonyms (I am using Solr 7.6). Here is my field
>> configuration:
>> 
>>
>>  
>>
>> 
>>
>>  
>>  > synonyms="synonyms.txt"/>
>>
>> 
>> 
>> 
>> 
>> And this is my synonyms.txt file:
>> frozen dinner,microwave food
>> 
>> Scenario 1: blue shirt (query with no synonyms)
>> 
>> Here is my first Solr query:
>> 
>> http://localhost:8983/solr/base/search?q=blue+shirt=title=edismax=on
>> 
>> And this is the parsed query I see in the debug output:
>> +((title:blue) (title:shirt))
>> 
>> Scenario 2: frozen dinner (query with synonyms)
>> 
>> Now, here is my second Solr query:
>> 
>> http://localhost:8983/solr/base/search?q=frozen+dinner=title=edismax=on
>> 
>> And this is the parsed query I see in the debug output:
>> +(((+title:microwave +title:food) (+title:frozen +title:dinner)))
>> 
>> I am wondering why the first query looks for documents containing at least
>> one of the two query tokens, whereas the second query looks for documents
>> with both of the query tokens? I would understand if it looked for both the
>> tokens of the synonyms (i.e. both microwave and food) to avoid the
>> sausagization problem. But I would like to get partial matches on the
>> original query at least (i.e. it should also match documents containing
>> just the token 'dinner').
>> 
>> Would any one know why the behavior is different across queries with and
>> without synonyms? And how could I work around this if I wanted partial
>> matches on queries that also have synonyms?
>> 
>> Ideally, I would like the parsed query in the second case to be:
>> +(((+title:microwave +title:food) (title:frozen title:dinner)))
>> 
>> I'd appreciate any help with this. Thanks!
>> 



Re: Solr edismax parser with multi-word synonyms

2019-07-18 Thread kshitij tyagi
Hi sunil,

1. as you have added "microwave food" in synonym as a multiword synonym to
"frozen dinner", edismax parsers finds your synonym in the file and is
considering your query as a Phrase query.

This is the reason you are seeing parsed query as  +(((+title:microwave
+title:food) (+title:frozen +title:dinner))), frozen dinner is considered
as a phrase here.

If you want partial match on your query then you can add frozen dinner,
microwave food, microwave, food to your synonym file and you will see the
parsed query as:
"+(((+title:microwave +title:food) title:miccrowave title:food
(+title:frozen +title:dinner)))"
 Another option is to write your own custom query parser and use it as a
plugin.

Hope this helps!!

kshitij


On Thu, Jul 18, 2019 at 9:14 AM Sunil Srinivasan  wrote:

>
> I have enabled the SynonymGraphFilter in my field configuration in order
> to support multi-word synonyms (I am using Solr 7.6). Here is my field
> configuration:
> 
> 
>   
> 
>
> 
>   
>synonyms="synonyms.txt"/>
> 
> 
>
> 
>
> And this is my synonyms.txt file:
> frozen dinner,microwave food
>
> Scenario 1: blue shirt (query with no synonyms)
>
> Here is my first Solr query:
>
> http://localhost:8983/solr/base/search?q=blue+shirt=title=edismax=on
>
> And this is the parsed query I see in the debug output:
> +((title:blue) (title:shirt))
>
> Scenario 2: frozen dinner (query with synonyms)
>
> Now, here is my second Solr query:
>
> http://localhost:8983/solr/base/search?q=frozen+dinner=title=edismax=on
>
> And this is the parsed query I see in the debug output:
> +(((+title:microwave +title:food) (+title:frozen +title:dinner)))
>
> I am wondering why the first query looks for documents containing at least
> one of the two query tokens, whereas the second query looks for documents
> with both of the query tokens? I would understand if it looked for both the
> tokens of the synonyms (i.e. both microwave and food) to avoid the
> sausagization problem. But I would like to get partial matches on the
> original query at least (i.e. it should also match documents containing
> just the token 'dinner').
>
> Would any one know why the behavior is different across queries with and
> without synonyms? And how could I work around this if I wanted partial
> matches on queries that also have synonyms?
>
> Ideally, I would like the parsed query in the second case to be:
> +(((+title:microwave +title:food) (title:frozen title:dinner)))
>
> I'd appreciate any help with this. Thanks!
>


Solr edismax parser with multi-word synonyms

2019-07-17 Thread Sunil Srinivasan

I have enabled the SynonymGraphFilter in my field configuration in order to 
support multi-word synonyms (I am using Solr 7.6). Here is my field 
configuration:


  



  
  





And this is my synonyms.txt file:
frozen dinner,microwave food

Scenario 1: blue shirt (query with no synonyms)

Here is my first Solr query:
http://localhost:8983/solr/base/search?q=blue+shirt=title=edismax=on

And this is the parsed query I see in the debug output:
+((title:blue) (title:shirt))

Scenario 2: frozen dinner (query with synonyms)

Now, here is my second Solr query:
http://localhost:8983/solr/base/search?q=frozen+dinner=title=edismax=on

And this is the parsed query I see in the debug output:
+(((+title:microwave +title:food) (+title:frozen +title:dinner)))

I am wondering why the first query looks for documents containing at least one 
of the two query tokens, whereas the second query looks for documents with both 
of the query tokens? I would understand if it looked for both the tokens of the 
synonyms (i.e. both microwave and food) to avoid the sausagization problem. But 
I would like to get partial matches on the original query at least (i.e. it 
should also match documents containing just the token 'dinner').

Would any one know why the behavior is different across queries with and 
without synonyms? And how could I work around this if I wanted partial matches 
on queries that also have synonyms?

Ideally, I would like the parsed query in the second case to be:
+(((+title:microwave +title:food) (title:frozen title:dinner)))

I'd appreciate any help with this. Thanks!


Re: Multi-word Synonyms - how does sow parameter work?

2018-08-16 Thread Roy Lim
Thanks Andrea for the tip.  I wasn't aware of the autoGeneratePhraseQueries
option for text fields, will definitely keep it in mind.

But I question if this is related to the fix on the query parser which
essentially introduces sow parameter and if false (looks like that is the
default in Solr 7), multiwords should be sent as a 'single input' (see
https://issues.apache.org/jira/browse/LUCENE-2605).  That defect doesn't
make mention of autoGeneratePhraseQueries.

I think this is where my confusion lies: as a non-developer unfortunately
I'm not clear what 'multiwords will be sent as a single input' means,
should it mean that it is treated as a phrase query?  Use AND?  So far as
mentioned I only observe that it is just OR clauses, which is no different
than before the fix.

Thanks again!



On Thu, Aug 16, 2018 at 12:39 AM, Andrea Gazzarini 
wrote:

> Hi Roy, I think you miss the autoGeneratePhraseQueries=true in the field
> type definition.
> I was on a slightly different use case when I met your same issue (I was
> using synonyms expansion at query time) and honestly I didn't understand
> why this is not the default and implicit behavior. In other words, like
> you, I can't imagine a scenario where I would a multi-terms synonym be
> destructured in multiple OR clauses.
>
> Best,
> Andrea
>
>
> On 16/08/18 02:07, Roy Lim wrote:
>
>> I am not using edismax (eventually I would like to get there) but I'm just
>> testing with standard query right now.  Original posting:
>>
>> I'm trying to figure out why the multi-word synonym expansion is not
>> working correctly (or, at least what I'm misunderstanding).  Specifically,
>> when I test a standard query with Solr Admin it appears to still split on
>> whitespace.
>>
>> Here is my setup:
>> - Solr 7.2.1
>> - synonym example: LCD => liquid crystal display
>> - q=myfield:LCD
>> - added parameter: sow=false
>> - myfield schema looks like (analyzer both applicable to index and query
>> time):
>> 
>> > positionIncrementGap="100">
>>
>>  
>>  > synonyms="synonyms.txt"/>
>>  ...
>> 
>>
>> When debugging the query, Solr Admin shows the parsed query as:
>> 
>> myfield:liquid myfield:crystal myfield:display
>> 
>>
>> (default operator being OR), as you can see it would incorrectly match on
>> any of those words, but not all, which is what I would expect...
>>
>> Should it not do a phrase query search for the exact translated synonym,
>> "liquid crystal display"?
>>
>>
>>
>> On Wed, Aug 15, 2018 at 5:01 PM, Doug Turnbull <
>> dturnb...@opensourceconnections.com> wrote:
>>
>> Also share your fieldType settings for myfield as well from your schema
>>> On Wed, Aug 15, 2018 at 8:00 PM Doug Turnbull <
>>> dturnb...@opensourceconnections.com> wrote:
>>>
>>> Aside from the screenshot issue, one  thing to check: are you searching
 with defType=edismax ?

 As in
 q=lcd=myfield=false=edismax

 ?

 Also sow=false should the the default on Solr 7 and above

 Doug

 On Wed, Aug 15, 2018 at 6:27 PM Roy Lim  wrote:

 I'm trying to figure out why the multi-word synonym expansion is not
> working
> correctly.  Specifically, when I test a standard query with Solr Admin
>
 it
>>>
 is
> still splitting on whitespace.
>
> Here is my setup:
> - Solr 7.2.1
> - synonym LCD => liquid crystal display
> - q=myfield:LCD
> - added: sow=false
> - myfield looks like:
>
>
> Solr Admin shows the parsed query looks like:
>
> myfield:liquid myfield:crystal myfield:display
>
> (default operator being OR), which would incorrectly match documents
>
 with
>>>
 any of those words, but not all, which is what I would expect...
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
> --
 CTO, OpenSource Connections
 Author, Relevant Search
 http://o19s.com/doug

 --
>>> CTO, OpenSource Connections
>>> Author, Relevant Search
>>> http://o19s.com/doug
>>>
>>>
>


Re: Multi-word Synonyms - how does sow parameter work?

2018-08-16 Thread Andrea Gazzarini
Hi Roy, I think you miss the autoGeneratePhraseQueries=true in the field 
type definition.
I was on a slightly different use case when I met your same issue (I was 
using synonyms expansion at query time) and honestly I didn't understand 
why this is not the default and implicit behavior. In other words, like 
you, I can't imagine a scenario where I would a multi-terms synonym be 
destructured in multiple OR clauses.


Best,
Andrea

On 16/08/18 02:07, Roy Lim wrote:

I am not using edismax (eventually I would like to get there) but I'm just
testing with standard query right now.  Original posting:

I'm trying to figure out why the multi-word synonym expansion is not
working correctly (or, at least what I'm misunderstanding).  Specifically,
when I test a standard query with Solr Admin it appears to still split on
whitespace.

Here is my setup:
- Solr 7.2.1
- synonym example: LCD => liquid crystal display
- q=myfield:LCD
- added parameter: sow=false
- myfield schema looks like (analyzer both applicable to index and query
time):


   
 
 
 ...


When debugging the query, Solr Admin shows the parsed query as:

myfield:liquid myfield:crystal myfield:display


(default operator being OR), as you can see it would incorrectly match on
any of those words, but not all, which is what I would expect...

Should it not do a phrase query search for the exact translated synonym,
"liquid crystal display"?



On Wed, Aug 15, 2018 at 5:01 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:


Also share your fieldType settings for myfield as well from your schema
On Wed, Aug 15, 2018 at 8:00 PM Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:


Aside from the screenshot issue, one  thing to check: are you searching
with defType=edismax ?

As in
q=lcd=myfield=false=edismax

?

Also sow=false should the the default on Solr 7 and above

Doug

On Wed, Aug 15, 2018 at 6:27 PM Roy Lim  wrote:


I'm trying to figure out why the multi-word synonym expansion is not
working
correctly.  Specifically, when I test a standard query with Solr Admin

it

is
still splitting on whitespace.

Here is my setup:
- Solr 7.2.1
- synonym LCD => liquid crystal display
- q=myfield:LCD
- added: sow=false
- myfield looks like:


Solr Admin shows the parsed query looks like:

myfield:liquid myfield:crystal myfield:display

(default operator being OR), which would incorrectly match documents

with

any of those words, but not all, which is what I would expect...





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


--
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug


--
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug





Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Roy Lim
I am not using edismax (eventually I would like to get there) but I'm just
testing with standard query right now.  Original posting:

I'm trying to figure out why the multi-word synonym expansion is not
working correctly (or, at least what I'm misunderstanding).  Specifically,
when I test a standard query with Solr Admin it appears to still split on
whitespace.

Here is my setup:
- Solr 7.2.1
- synonym example: LCD => liquid crystal display
- q=myfield:LCD
- added parameter: sow=false
- myfield schema looks like (analyzer both applicable to index and query
time):


  


...


When debugging the query, Solr Admin shows the parsed query as:

myfield:liquid myfield:crystal myfield:display


(default operator being OR), as you can see it would incorrectly match on
any of those words, but not all, which is what I would expect...

Should it not do a phrase query search for the exact translated synonym,
"liquid crystal display"?



On Wed, Aug 15, 2018 at 5:01 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Also share your fieldType settings for myfield as well from your schema
> On Wed, Aug 15, 2018 at 8:00 PM Doug Turnbull <
> dturnb...@opensourceconnections.com> wrote:
>
> > Aside from the screenshot issue, one  thing to check: are you searching
> > with defType=edismax ?
> >
> > As in
> > q=lcd=myfield=false=edismax
> >
> > ?
> >
> > Also sow=false should the the default on Solr 7 and above
> >
> > Doug
> >
> > On Wed, Aug 15, 2018 at 6:27 PM Roy Lim  wrote:
> >
> >> I'm trying to figure out why the multi-word synonym expansion is not
> >> working
> >> correctly.  Specifically, when I test a standard query with Solr Admin
> it
> >> is
> >> still splitting on whitespace.
> >>
> >> Here is my setup:
> >> - Solr 7.2.1
> >> - synonym LCD => liquid crystal display
> >> - q=myfield:LCD
> >> - added: sow=false
> >> - myfield looks like:
> >>
> >>
> >> Solr Admin shows the parsed query looks like:
> >>
> >> myfield:liquid myfield:crystal myfield:display
> >>
> >> (default operator being OR), which would incorrectly match documents
> with
> >> any of those words, but not all, which is what I would expect...
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >>
> > --
> > CTO, OpenSource Connections
> > Author, Relevant Search
> > http://o19s.com/doug
> >
> --
> CTO, OpenSource Connections
> Author, Relevant Search
> http://o19s.com/doug
>


Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Doug Turnbull
Also share your fieldType settings for myfield as well from your schema
On Wed, Aug 15, 2018 at 8:00 PM Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Aside from the screenshot issue, one  thing to check: are you searching
> with defType=edismax ?
>
> As in
> q=lcd=myfield=false=edismax
>
> ?
>
> Also sow=false should the the default on Solr 7 and above
>
> Doug
>
> On Wed, Aug 15, 2018 at 6:27 PM Roy Lim  wrote:
>
>> I'm trying to figure out why the multi-word synonym expansion is not
>> working
>> correctly.  Specifically, when I test a standard query with Solr Admin it
>> is
>> still splitting on whitespace.
>>
>> Here is my setup:
>> - Solr 7.2.1
>> - synonym LCD => liquid crystal display
>> - q=myfield:LCD
>> - added: sow=false
>> - myfield looks like:
>>
>>
>> Solr Admin shows the parsed query looks like:
>>
>> myfield:liquid myfield:crystal myfield:display
>>
>> (default operator being OR), which would incorrectly match documents with
>> any of those words, but not all, which is what I would expect...
>>
>>
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
> --
> CTO, OpenSource Connections
> Author, Relevant Search
> http://o19s.com/doug
>
-- 
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug


Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Doug Turnbull
Aside from the screenshot issue, one  thing to check: are you searching
with defType=edismax ?

As in
q=lcd=myfield=false=edismax

?

Also sow=false should the the default on Solr 7 and above

Doug

On Wed, Aug 15, 2018 at 6:27 PM Roy Lim  wrote:

> I'm trying to figure out why the multi-word synonym expansion is not
> working
> correctly.  Specifically, when I test a standard query with Solr Admin it
> is
> still splitting on whitespace.
>
> Here is my setup:
> - Solr 7.2.1
> - synonym LCD => liquid crystal display
> - q=myfield:LCD
> - added: sow=false
> - myfield looks like:
>
>
> Solr Admin shows the parsed query looks like:
>
> myfield:liquid myfield:crystal myfield:display
>
> (default operator being OR), which would incorrectly match documents with
> any of those words, but not all, which is what I would expect...
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
-- 
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug


Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Steve Rowe
Yes please.  That way we’ll see the whole thing.

--
Steve
www.lucidworks.com

> On Aug 15, 2018, at 7:20 PM, Roy Lim  wrote:
> 
> I've subscribed, shall I re-post it then via email?
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Roy Lim
I've subscribed, shall I re-post it then via email?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Steve Rowe
Roy,

Not sure of the point of Nabble when it strips content before passing messages 
on to the mailing list.  I’ve emailed them about this problem in the past but 
they have done nothing about it.

Updating a post on Nabble will never make it to the mailing list.  If you want 
us to be able to read your post in full, you should subscribe to the mailing 
list instead of using Nabble.  Instructions here: 
http://lucene.apache.org/solr/community.html#solr-user-list-solr-userluceneapacheorg

--
Steve
www.lucidworks.com

> On Aug 15, 2018, at 7:00 PM, Roy Lim  wrote:
> 
> Thanks, updated original post.  It just removed what I surrounded with the
> raw text markup, I've added it back without markup.  Not sure of the point
> of raw text if it's always removed 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Roy Lim
Thanks, updated original post.  It just removed what I surrounded with the
raw text markup, I've added it back without markup.  Not sure of the point
of raw text if it's always removed 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Erick Erickson
The mail server strips pretty much all screenshots and attachments, so
I think some of the data you're trying to provide is missing from the
e-mail.

Best,
Erick

On Wed, Aug 15, 2018 at 3:27 PM, Roy Lim  wrote:
> I'm trying to figure out why the multi-word synonym expansion is not working
> correctly.  Specifically, when I test a standard query with Solr Admin it is
> still splitting on whitespace.
>
> Here is my setup:
> - Solr 7.2.1
> - synonym LCD => liquid crystal display
> - q=myfield:LCD
> - added: sow=false
> - myfield looks like:
>
>
> Solr Admin shows the parsed query looks like:
>
> myfield:liquid myfield:crystal myfield:display
>
> (default operator being OR), which would incorrectly match documents with
> any of those words, but not all, which is what I would expect...
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Multi-word Synonyms - how does sow parameter work?

2018-08-15 Thread Roy Lim
I'm trying to figure out why the multi-word synonym expansion is not working
correctly.  Specifically, when I test a standard query with Solr Admin it is
still splitting on whitespace.

Here is my setup:
- Solr 7.2.1
- synonym LCD => liquid crystal display
- q=myfield:LCD
- added: sow=false
- myfield looks like:


Solr Admin shows the parsed query looks like:

myfield:liquid myfield:crystal myfield:display

(default operator being OR), which would incorrectly match documents with
any of those words, but not all, which is what I would expect...





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Multi word synonyms

2017-03-27 Thread Doug Turnbull
Fntastic!
On Mon, Mar 27, 2017 at 9:56 AM alessandro.benedetti <a.benede...@sease.io>
wrote:

> In addition to what Doug has already pointed out, i would like to highlight
> this contribution in Solr 6.5.0 .
> It may seem like a small innocent patch but it actually open a new worlds
> for one of the most controversial aspects of Solr Query Parsing :
>
> http://issues.apache.org/jira/browse/SOLR-9185
>
> Cheers
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp4326863p4326998.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Multi word synonyms

2017-03-27 Thread alessandro.benedetti
In addition to what Doug has already pointed out, i would like to highlight
this contribution in Solr 6.5.0 .
It may seem like a small innocent patch but it actually open a new worlds
for one of the most controversial aspects of Solr Query Parsing :

http://issues.apache.org/jira/browse/SOLR-9185

Cheers



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp4326863p4326998.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi word synonyms

2017-03-26 Thread Doug Turnbull
You might have stumbled on all these articles, but you can probably read
our orgs progression with this problem as a play in 3 acts

Act I Introducing the characters

http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so-hard-in-solr/

Act II Heroes Meet Despair
http://opensourceconnections.com/blog/2016/06/23/solr-multi-word-synonym-solutions-2016/

Act III Triumph
We use a combination of these techniques
http://opensourceconnections.com/blog/2016/12/02/solr-elasticsearch-synonyms-better-patterns-keyphrases/
http://opensourceconnections.com/blog/2016/12/23/elasticsearch-synonyms-patterns-taxonomies/

Made possible in Solr with our Match Query Parser, which IMO is the most
satisfactory solution. I'm of course biased given we created it

http://opensourceconnections.com/blog/2017/01/23/our-solution-to-solr-multiterm-synonyms/


All of these articles also point towards other solutions, like auto
phrasing query parser/token filter and hon-lucene-synonyms.
On Sun, Mar 26, 2017 at 7:05 PM John Blythe <j...@curvolabs.com> wrote:

> Sure thing. Post back w what you find!
>
> Good luck-
>
> On Sun, Mar 26, 2017 at 3:36 PM Sanjana Sridhar <sanjana.srid...@flipp.com
> >
> wrote:
>
> > Hi John,
> >
> > Thanks for letting me know what works for you. I'm going to try that out.
> > Sounds like a suitable solution to my problem.
> >
> > Best,
> > Sanjana
> >
> >
> >
> > On Sun, Mar 26, 2017 at 12:30 PM, John Blythe <j...@curvolabs.com>
> wrote:
> >
> > > I use the keyword tokenizer and then pattern replace to transform multi
> > > words into underscore connected tokens. For instance, "Burger Joint"
> > > transforms to "burger_joint" which then looks in my synonym filter for
> > > underscored synonyms. When it matches I then replace underscores with
> > > spaces or just toss over to the word delimiter filter factory before
> > > further processing
> > >
> > >
> > > On Sun, Mar 26, 2017 at 11:53 AM Sanjana Sridhar <
> > > sanjana.srid...@wishabi.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > Does anyone have a good solution for working with multi word
> synonyms?
> > > I've
> > > > been reading a lot about this online and haven't really found a great
> > > > solution to it. I use the SynonymFilterFactory at index time, but
> words
> > > > don't really get matched to the appropriate multi word synonyms, even
> > > > though using the Analysis tool shows that it should be matched.
> > > >
> > > > Examples:
> > > >
> > > > coke, coca cola
> > > >
> > > >
> > > >
> > > > This is the configuration I have on text fields:
> > > >
> > > >  > > > positionIncrementGap="100" multiValued="true">
> > > > 
> > > > 
> > > > 
> > > >  > > expand=
> > > > "true" synonyms="synonyms.txt" />
> > > > 
> > > >  > > > generateWordParts="0" generateNumberParts = "0"
> > > >   splitOnCaseChange = "0" preserveOriginal="1"
> > > catenateWords="1"/>
> > > > 
> > > > 
> > > > 
> > > >  > > > pattern="(.*[\*].*)"  replacement=""/>
> > > > 
> > > > 
> > > > 
> > > >
> > > >   
> > > >   
> > > > 
> > > >  
> > > >  
> > > >   > > > generateWordParts="0" generateNumberParts = "0"
> > > >   splitOnCaseChange = "0" preserveOriginal="1"
> > > catenateWords="1"/>
> > > > 
> > > > 
> > > > 
> > > > 
> > > >   
> > > >   
> > > > 0.0
> > > >   
> > > > 
> > > >
> > > >
> > > > Greatly appreciate any help ya'll can offer.
> > > >
> > > > Thanks,
> > > > Sanjana
> > > >
> > > > --
> > > > IMPORTANT NOTICE:  This message, including any attachments
> (hereinafter
> > > > collectively referred to as "Communication"), is intend

Re: Multi word synonyms

2017-03-26 Thread John Blythe
Sure thing. Post back w what you find!

Good luck-

On Sun, Mar 26, 2017 at 3:36 PM Sanjana Sridhar <sanjana.srid...@flipp.com>
wrote:

> Hi John,
>
> Thanks for letting me know what works for you. I'm going to try that out.
> Sounds like a suitable solution to my problem.
>
> Best,
> Sanjana
>
>
>
> On Sun, Mar 26, 2017 at 12:30 PM, John Blythe <j...@curvolabs.com> wrote:
>
> > I use the keyword tokenizer and then pattern replace to transform multi
> > words into underscore connected tokens. For instance, "Burger Joint"
> > transforms to "burger_joint" which then looks in my synonym filter for
> > underscored synonyms. When it matches I then replace underscores with
> > spaces or just toss over to the word delimiter filter factory before
> > further processing
> >
> >
> > On Sun, Mar 26, 2017 at 11:53 AM Sanjana Sridhar <
> > sanjana.srid...@wishabi.com> wrote:
> >
> > > Hello,
> > >
> > > Does anyone have a good solution for working with multi word synonyms?
> > I've
> > > been reading a lot about this online and haven't really found a great
> > > solution to it. I use the SynonymFilterFactory at index time, but words
> > > don't really get matched to the appropriate multi word synonyms, even
> > > though using the Analysis tool shows that it should be matched.
> > >
> > > Examples:
> > >
> > > coke, coca cola
> > >
> > >
> > >
> > > This is the configuration I have on text fields:
> > >
> > >  > > positionIncrementGap="100" multiValued="true">
> > > 
> > > 
> > > 
> > >  > expand=
> > > "true" synonyms="synonyms.txt" />
> > > 
> > >  > > generateWordParts="0" generateNumberParts = "0"
> > >   splitOnCaseChange = "0" preserveOriginal="1"
> > catenateWords="1"/>
> > > 
> > > 
> > > 
> > >  > > pattern="(.*[\*].*)"  replacement=""/>
> > > 
> > > 
> > > 
> > >
> > >   
> > >   
> > > 
> > >  
> > >  
> > >   > > generateWordParts="0" generateNumberParts = "0"
> > >   splitOnCaseChange = "0" preserveOriginal="1"
> > catenateWords="1"/>
> > > 
> > > 
> > > 
> > > 
> > >   
> > >   
> > > 0.0
> > >   
> > > 
> > >
> > >
> > > Greatly appreciate any help ya'll can offer.
> > >
> > > Thanks,
> > > Sanjana
> > >
> > > --
> > > IMPORTANT NOTICE:  This message, including any attachments (hereinafter
> > > collectively referred to as "Communication"), is intended only for the
> > > addressee(s)
> > > named above.  This Communication may include information that is
> > > privileged, confidential and exempt from disclosure under applicable
> law.
> > >  If the recipient of this Communication is not the intended recipient,
> or
> > > the employee or agent responsible for delivering this Communication to
> > the
> > > intended recipient, you are notified that any dissemination,
> distribution
> > > or copying of this Communication is strictly prohibited.  If you have
> > > received this Communication in error, please notify the sender
> > immediately
> > > by phone or email and permanently delete this Communication from your
> > > computer without making a copy. Thank you.
> > >
> > --
> > --
> > *John Blythe*
> > Product Manager & Lead Developer
> >
> > 251.605.3071 | j...@curvolabs.com
> > www.curvolabs.com
> >
> > 58 Adams Ave
> > Evansville, IN 47713
> >
>
>
>
> --
>
> <http://corp.flipp.com/> <http://corp.flipp.com/>
>
> Sanjana Sridhar
> Flipp Corporation
>
> p: 647-217-3599
> e: sanjana.srid...@flipp.com
>
> --
> IMPORTANT NOTICE:  This message, including any attachments (hereinafter
> collectively referred to as "Communication"), is intended only for the
> addressee(s)
> named above.  This Communication may include information that is
> privileged, confidential and exempt from disclosure under applicable law.
>  If the recipient of this Communication is not the intended recipient, or
> the employee or agent responsible for delivering this Communication to the
> intended recipient, you are notified that any dissemination, distribution
> or copying of this Communication is strictly prohibited.  If you have
> received this Communication in error, please notify the sender immediately
> by phone or email and permanently delete this Communication from your
> computer without making a copy. Thank you.
>
-- 
-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713


Re: Multi word synonyms

2017-03-26 Thread Sanjana Sridhar
Hi John,

Thanks for letting me know what works for you. I'm going to try that out.
Sounds like a suitable solution to my problem.

Best,
Sanjana



On Sun, Mar 26, 2017 at 12:30 PM, John Blythe <j...@curvolabs.com> wrote:

> I use the keyword tokenizer and then pattern replace to transform multi
> words into underscore connected tokens. For instance, "Burger Joint"
> transforms to "burger_joint" which then looks in my synonym filter for
> underscored synonyms. When it matches I then replace underscores with
> spaces or just toss over to the word delimiter filter factory before
> further processing
>
>
> On Sun, Mar 26, 2017 at 11:53 AM Sanjana Sridhar <
> sanjana.srid...@wishabi.com> wrote:
>
> > Hello,
> >
> > Does anyone have a good solution for working with multi word synonyms?
> I've
> > been reading a lot about this online and haven't really found a great
> > solution to it. I use the SynonymFilterFactory at index time, but words
> > don't really get matched to the appropriate multi word synonyms, even
> > though using the Analysis tool shows that it should be matched.
> >
> > Examples:
> >
> > coke, coca cola
> >
> >
> >
> > This is the configuration I have on text fields:
> >
> >  > positionIncrementGap="100" multiValued="true">
> > 
> > 
> > 
> >  expand=
> > "true" synonyms="synonyms.txt" />
> > 
> >  > generateWordParts="0" generateNumberParts = "0"
> >   splitOnCaseChange = "0" preserveOriginal="1"
> catenateWords="1"/>
> > 
> > 
> > 
> >  > pattern="(.*[\*].*)"  replacement=""/>
> > 
> > 
> > 
> >
> >   
> >   
> > 
> >  
> >  
> >   > generateWordParts="0" generateNumberParts = "0"
> >   splitOnCaseChange = "0" preserveOriginal="1"
> catenateWords="1"/>
> > 
> > 
> > 
> > 
> >   
> >   
> > 0.0
> >   
> > 
> >
> >
> > Greatly appreciate any help ya'll can offer.
> >
> > Thanks,
> > Sanjana
> >
> > --
> > IMPORTANT NOTICE:  This message, including any attachments (hereinafter
> > collectively referred to as "Communication"), is intended only for the
> > addressee(s)
> > named above.  This Communication may include information that is
> > privileged, confidential and exempt from disclosure under applicable law.
> >  If the recipient of this Communication is not the intended recipient, or
> > the employee or agent responsible for delivering this Communication to
> the
> > intended recipient, you are notified that any dissemination, distribution
> > or copying of this Communication is strictly prohibited.  If you have
> > received this Communication in error, please notify the sender
> immediately
> > by phone or email and permanently delete this Communication from your
> > computer without making a copy. Thank you.
> >
> --
> --
> *John Blythe*
> Product Manager & Lead Developer
>
> 251.605.3071 | j...@curvolabs.com
> www.curvolabs.com
>
> 58 Adams Ave
> Evansville, IN 47713
>



-- 

<http://corp.flipp.com/> <http://corp.flipp.com/>

Sanjana Sridhar
Flipp Corporation

p: 647-217-3599
e: sanjana.srid...@flipp.com

-- 
IMPORTANT NOTICE:  This message, including any attachments (hereinafter 
collectively referred to as "Communication"), is intended only for the 
addressee(s) 
named above.  This Communication may include information that is 
privileged, confidential and exempt from disclosure under applicable law. 
 If the recipient of this Communication is not the intended recipient, or 
the employee or agent responsible for delivering this Communication to the 
intended recipient, you are notified that any dissemination, distribution 
or copying of this Communication is strictly prohibited.  If you have 
received this Communication in error, please notify the sender immediately 
by phone or email and permanently delete this Communication from your 
computer without making a copy. Thank you.


Re: Multi word synonyms

2017-03-26 Thread John Blythe
I use the keyword tokenizer and then pattern replace to transform multi
words into underscore connected tokens. For instance, "Burger Joint"
transforms to "burger_joint" which then looks in my synonym filter for
underscored synonyms. When it matches I then replace underscores with
spaces or just toss over to the word delimiter filter factory before
further processing


On Sun, Mar 26, 2017 at 11:53 AM Sanjana Sridhar <
sanjana.srid...@wishabi.com> wrote:

> Hello,
>
> Does anyone have a good solution for working with multi word synonyms? I've
> been reading a lot about this online and haven't really found a great
> solution to it. I use the SynonymFilterFactory at index time, but words
> don't really get matched to the appropriate multi word synonyms, even
> though using the Analysis tool shows that it should be matched.
>
> Examples:
>
> coke, coca cola
>
>
>
> This is the configuration I have on text fields:
>
>  positionIncrementGap="100" multiValued="true">
> 
> 
> 
>  "true" synonyms="synonyms.txt" />
> 
>  generateWordParts="0" generateNumberParts = "0"
>   splitOnCaseChange = "0" preserveOriginal="1" catenateWords="1"/>
> 
> 
> 
>  pattern="(.*[\*].*)"  replacement=""/>
> 
> 
> 
>
>   
>   
> 
>  
>  
>   generateWordParts="0" generateNumberParts = "0"
>   splitOnCaseChange = "0" preserveOriginal="1" catenateWords="1"/>
> 
> 
> 
> 
>   
>   
> 0.0
>   
> 
>
>
> Greatly appreciate any help ya'll can offer.
>
> Thanks,
> Sanjana
>
> --
> IMPORTANT NOTICE:  This message, including any attachments (hereinafter
> collectively referred to as "Communication"), is intended only for the
> addressee(s)
> named above.  This Communication may include information that is
> privileged, confidential and exempt from disclosure under applicable law.
>  If the recipient of this Communication is not the intended recipient, or
> the employee or agent responsible for delivering this Communication to the
> intended recipient, you are notified that any dissemination, distribution
> or copying of this Communication is strictly prohibited.  If you have
> received this Communication in error, please notify the sender immediately
> by phone or email and permanently delete this Communication from your
> computer without making a copy. Thank you.
>
-- 
-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713


Multi word synonyms

2017-03-26 Thread Sanjana Sridhar
Hello,

Does anyone have a good solution for working with multi word synonyms? I've
been reading a lot about this online and haven't really found a great
solution to it. I use the SynonymFilterFactory at index time, but words
don't really get matched to the appropriate multi word synonyms, even
though using the Analysis tool shows that it should be matched.

Examples:

coke, coca cola



This is the configuration I have on text fields:
















  
  

 
 
 




  
  
0.0
  



Greatly appreciate any help ya'll can offer.

Thanks,
Sanjana

-- 
IMPORTANT NOTICE:  This message, including any attachments (hereinafter 
collectively referred to as "Communication"), is intended only for the 
addressee(s) 
named above.  This Communication may include information that is 
privileged, confidential and exempt from disclosure under applicable law. 
 If the recipient of this Communication is not the intended recipient, or 
the employee or agent responsible for delivering this Communication to the 
intended recipient, you are notified that any dissemination, distribution 
or copying of this Communication is strictly prohibited.  If you have 
received this Communication in error, please notify the sender immediately 
by phone or email and permanently delete this Communication from your 
computer without making a copy. Thank you.


Re: Solr 6.4 new SynonymGraphFilter help for multi-word synonyms

2017-02-03 Thread David Smiley
Solr _does_ have a query parser that doesn't suffer from this problem --
SimpleQParser chosen as the string "simple".
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-SimpleQueryParser
In this case, see the "WHITESPACE" operator feature which can be toggled.
Configure to be _not_ an operator so that whitespace is processed by the
underlying Analyzer to get proper multi-word handling.  This is a very fine
query parser, IMO; much simpler than any other that has it's feature set.
Though you still might need dismax/edismax.

On Thu, Feb 2, 2017 at 1:17 PM Cliff Dickinson 
wrote:

> Steve and Shawn, thanks for your replies/explanations!
>
> I eagerly await the completion of the Solr JIRA ticket referenced above in
> a future release.  Many thanks for addressing this challenge that has had
> me banging my head against my desk off and on for the last couple years!
>
> Cliff
>
> On Thu, Feb 2, 2017 at 1:01 PM, Steve Rowe  wrote:
>
> > Hi Cliff,
> >
> > The Solr query parsers (standard/“Lucene” and e/dismax anyway) have a
> > problem that prevents SynonymGraphFilter from working: the text fed to
> your
> > query analyzer is first split on whitespace.  So e.g. a query containing
> > “United States” will never match multi-word synonym “United
> States”->”US”,
> > since the analyzer will fist see “United” and then, separately, “States”.
> >
> > I fixed the whitespace splitting problem in the classic Lucene query
> > parser in .  (Note
> > that this is *not* the same as Solr’s standard/“Lucene” query parser,
> which
> > is actually a fork of Lucene’s query parser with added functionality.)
> >
> > There is a Solr JIRA I’m working on to fix the whitespace splitting
> > problem: .  I hope to
> > get it committed in time for inclusion in Solr 6.5.
> >
> > --
> > Steve
> > www.lucidworks.com
> >
> > > On Feb 2, 2017, at 9:50 AM, Shawn Heisey  wrote:
> > >
> > > On 2/2/2017 7:36 AM, Cliff Dickinson wrote:
> > >> The SynonymGraphFilter API documentation contains the following
> > statement
> > >> at the end:
> > >>
> > >> "To get fully correct positional queries when your synonym
> replacements
> > are
> > >> multiple tokens, you should instead apply synonyms using this
> > TokenFilter
> > >> at query time and translate the resulting graph to a
> TermAutomatonQuery
> > >> e.g. using TokenStreamToTermAutomatonQuery."
> > >
> > > Lucene is a programming API for search.  That documentation is intended
> > > for people who are writing Lucene programs.  Those users would be
> > > constructing query objects in their own code, so they would most likely
> > > know exactly which object needs to be changed to TermAutomatonQuery.
> > >
> > > Solr is a Lucene program ... and an immensely complicated one.  Many
> > > Lucene improvements require changes in the end program for full
> > > support.  I suspect that Solr's capability has not been updated to use
> > > this new feature in Lucene.  I cannot say for sure, I hope someone who
> > > is familiar with this Lucene change and Solr internals can comment.
> > >
> > > Thanks,
> > > Shawn
> > >
> >
> >
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


Re: Solr 6.4 new SynonymGraphFilter help for multi-word synonyms

2017-02-02 Thread Cliff Dickinson
Steve and Shawn, thanks for your replies/explanations!

I eagerly await the completion of the Solr JIRA ticket referenced above in
a future release.  Many thanks for addressing this challenge that has had
me banging my head against my desk off and on for the last couple years!

Cliff

On Thu, Feb 2, 2017 at 1:01 PM, Steve Rowe  wrote:

> Hi Cliff,
>
> The Solr query parsers (standard/“Lucene” and e/dismax anyway) have a
> problem that prevents SynonymGraphFilter from working: the text fed to your
> query analyzer is first split on whitespace.  So e.g. a query containing
> “United States” will never match multi-word synonym “United States”->”US”,
> since the analyzer will fist see “United” and then, separately, “States”.
>
> I fixed the whitespace splitting problem in the classic Lucene query
> parser in .  (Note
> that this is *not* the same as Solr’s standard/“Lucene” query parser, which
> is actually a fork of Lucene’s query parser with added functionality.)
>
> There is a Solr JIRA I’m working on to fix the whitespace splitting
> problem: .  I hope to
> get it committed in time for inclusion in Solr 6.5.
>
> --
> Steve
> www.lucidworks.com
>
> > On Feb 2, 2017, at 9:50 AM, Shawn Heisey  wrote:
> >
> > On 2/2/2017 7:36 AM, Cliff Dickinson wrote:
> >> The SynonymGraphFilter API documentation contains the following
> statement
> >> at the end:
> >>
> >> "To get fully correct positional queries when your synonym replacements
> are
> >> multiple tokens, you should instead apply synonyms using this
> TokenFilter
> >> at query time and translate the resulting graph to a TermAutomatonQuery
> >> e.g. using TokenStreamToTermAutomatonQuery."
> >
> > Lucene is a programming API for search.  That documentation is intended
> > for people who are writing Lucene programs.  Those users would be
> > constructing query objects in their own code, so they would most likely
> > know exactly which object needs to be changed to TermAutomatonQuery.
> >
> > Solr is a Lucene program ... and an immensely complicated one.  Many
> > Lucene improvements require changes in the end program for full
> > support.  I suspect that Solr's capability has not been updated to use
> > this new feature in Lucene.  I cannot say for sure, I hope someone who
> > is familiar with this Lucene change and Solr internals can comment.
> >
> > Thanks,
> > Shawn
> >
>
>


Re: Solr 6.4 new SynonymGraphFilter help for multi-word synonyms

2017-02-02 Thread Steve Rowe
Hi Cliff,

The Solr query parsers (standard/“Lucene” and e/dismax anyway) have a problem 
that prevents SynonymGraphFilter from working: the text fed to your query 
analyzer is first split on whitespace.  So e.g. a query containing “United 
States” will never match multi-word synonym “United States”->”US”, since the 
analyzer will fist see “United” and then, separately, “States”.

I fixed the whitespace splitting problem in the classic Lucene query parser in 
.  (Note that this is *not* 
the same as Solr’s standard/“Lucene” query parser, which is actually a fork of 
Lucene’s query parser with added functionality.)

There is a Solr JIRA I’m working on to fix the whitespace splitting problem: 
.  I hope to get it committed 
in time for inclusion in Solr 6.5.

--
Steve
www.lucidworks.com

> On Feb 2, 2017, at 9:50 AM, Shawn Heisey  wrote:
> 
> On 2/2/2017 7:36 AM, Cliff Dickinson wrote:
>> The SynonymGraphFilter API documentation contains the following statement
>> at the end:
>> 
>> "To get fully correct positional queries when your synonym replacements are
>> multiple tokens, you should instead apply synonyms using this TokenFilter
>> at query time and translate the resulting graph to a TermAutomatonQuery
>> e.g. using TokenStreamToTermAutomatonQuery."
> 
> Lucene is a programming API for search.  That documentation is intended
> for people who are writing Lucene programs.  Those users would be
> constructing query objects in their own code, so they would most likely
> know exactly which object needs to be changed to TermAutomatonQuery.
> 
> Solr is a Lucene program ... and an immensely complicated one.  Many
> Lucene improvements require changes in the end program for full
> support.  I suspect that Solr's capability has not been updated to use
> this new feature in Lucene.  I cannot say for sure, I hope someone who
> is familiar with this Lucene change and Solr internals can comment.
> 
> Thanks,
> Shawn
> 



Re: Solr 6.4 new SynonymGraphFilter help for multi-word synonyms

2017-02-02 Thread Shawn Heisey
On 2/2/2017 7:36 AM, Cliff Dickinson wrote:
> The SynonymGraphFilter API documentation contains the following statement
> at the end:
>
> "To get fully correct positional queries when your synonym replacements are
> multiple tokens, you should instead apply synonyms using this TokenFilter
> at query time and translate the resulting graph to a TermAutomatonQuery
> e.g. using TokenStreamToTermAutomatonQuery."

Lucene is a programming API for search.  That documentation is intended
for people who are writing Lucene programs.  Those users would be
constructing query objects in their own code, so they would most likely
know exactly which object needs to be changed to TermAutomatonQuery.

Solr is a Lucene program ... and an immensely complicated one.  Many
Lucene improvements require changes in the end program for full
support.  I suspect that Solr's capability has not been updated to use
this new feature in Lucene.  I cannot say for sure, I hope someone who
is familiar with this Lucene change and Solr internals can comment.

Thanks,
Shawn



Solr 6.4 new SynonymGraphFilter help for multi-word synonyms

2017-02-02 Thread Cliff Dickinson
I've been eagerly awaiting the release of the new SynonymGraphFilter in
Solr 6.4.  We have the need to support multi-word synonyms, which were
always problematic with the old SynonymFilterFactory.  I've upgraded to
Solr 6.4 and replaced the old filter with the new one, but am not seeing
the results that I had hoped for yet.  I suspect my configuration is
lacking something important.

I'm starting with the simple example in the SynonymGraphFilterFactory API
doucmentation:








And example entry in the synonyms.txt file is:

booster, representative of athletics interest

My problem with the old filter has always been that if I run a query for
"booster", I get results containing any of the following words: booster,
representative, athletics, interest.  This is way more results than I
want.  A document that only contains athletics, but none of the other words
in the synonym is returned.  What I really want are documents that contain
"booster" or the full synonym phrase of "representative of athletics
interest".  How could I accomplish this?

The SynonymGraphFilter API documentation contains the following statement
at the end:

"To get fully correct positional queries when your synonym replacements are
multiple tokens, you should instead apply synonyms using this TokenFilter
at query time and translate the resulting graph to a TermAutomatonQuery
e.g. using TokenStreamToTermAutomatonQuery."

How do I use TokenStreamtoTermAutomationQuery or can this not be configured
in Solr, but only by writing code against Lucene?  Would this even address
my issue?

I've found synonyms to be very frustrating in Solr and am hoping this new
filter will be a big improvement.  Thanks in advance for the help!


RE: Multi word synonyms

2016-11-15 Thread Davis, Daniel (NIH/NLM) [C]
Midas,

I apparently I didn't read carefully enough, Ted Sullivan has in the 
configuration of this AutoPhrasingTokenFilter a configuration file 
"autophrases.txt".   It only recognizes phrases that are in that file.   
Because of this, it doesn't seem directly applicable to your problem of 
multi-word synonym matching at query time - because it won't know what terms to 
clump.Here's Ted Sullivan's earlier post on the Token filter - 
https://lucidworks.com/blog/2014/07/02/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/

I would therefore ask your users or their representative about the priority of 
this feature/requirement.

Going on, I think what you could do is to use an NLP toolkit such as OpenNLP, 
StanfordNLP (both Java) or python NLTK to identify noun phrases in your 
text/corpus, and then use those to build autophrases.txt.   You wouldn't need 
to use all of your corpus to get somewhat good accuracy because new noun 
phrases will be rare at some point.   You may need to play with which phrases 
to include, e.g. the size of autophrases.txt depending on how 
AutoPhrasingTokenFilter is implemented and the rate of indexing you need to 
maintain. Depending on your experience, you can do this even if you are new to 
Solr, as you've mentioned.

-Original Message-
From: Davis, Daniel (NIH/NLM) [C] 
Sent: Tuesday, November 15, 2016 10:22 AM
To: solr-user@lucene.apache.org
Subject: RE: Multi word synonyms

I'm not as expert as some on this list, but reading the article suggested, 
https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/,
 what you do is this:

- Have one field that takes text as normal
- Copy that field to another field, whose field type uses the 
AutoPhrasingTokenFilter
- Configure your result handler to query against both fields

You don't know the list of synonyms at query time, but now you have another 
field that contains phrases, not words, and so you can indeed use synonym 
matching at query time against this secondary field.   You can even use the 
multi-word phrases in the copied field to suggest to admin users a list of 
candidate synonyms.

-Original Message-
From: Midas A [mailto:test.mi...@gmail.com]
Sent: Tuesday, November 15, 2016 7:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Multi word synonyms

I am new with solr  . How i should solve this problem ?

Can we do something at query time ?

On Tue, Nov 15, 2016 at 5:35 PM, Vincenzo D'Amore <v.dam...@gmail.com>
wrote:

> Hi Michael,
>
> an update, reading the article I double checked if at least one of the 
> issues were fixed.
> The good news is that
> https://issues.apache.org/jira/browse/LUCENE-2605
> has
> been closed and is available in 6.2.
>
> On Tue, Nov 15, 2016 at 12:32 PM, Michael Kuhlmann <k...@solr.info> wrote:
>
> > This is a nice reading though, but that solution depends on the 
> > precondition that you'll already know your synonyms at index time.
> >
> > While having synonyms in the index is mostly the better solution 
> > anyway, it's sometimes not feasible.
> >
> > -Michael
> >
> > Am 15.11.2016 um 12:14 schrieb Vincenzo D'Amore:
> > > Hi Midas,
> > >
> > > I suggest this interesting reading:
> > >
> > > https://lucidworks.com/blog/2014/07/12/solution-for-multi-
> > term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> > >
> > >
> > >
> > > On Tue, Nov 15, 2016 at 11:00 AM, Michael Kuhlmann 
> > > <k...@solr.info>
> > wrote:
> > >
> > >> It's not working out of the box, sorry.
> > >>
> > >> We're using this plugin:
> > >> https://github.com/healthonnet/hon-lucene-synonyms#getting-starte
> > >> d
> > >>
> > >> It's working nicely, but can lead to OOME when you add many 
> > >> synonyms with multiple terms. And I'm not sure whether it#s still 
> > >> working with Solr 6.0.
> > >>
> > >> -Michael
> > >>
> > >> Am 15.11.2016 um 10:29 schrieb Midas A:
> > >>> - i have to  use multi word synonyms at query time .
> > >>>
> > >>> Please suggest how can i do it .
> > >>> and let me know it whether it would be visible in debug query or 
> > >>> not
> .
> > >>>
> > >>
> > >
> >
> >
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>


RE: Multi word synonyms

2016-11-15 Thread Davis, Daniel (NIH/NLM) [C]
I'm not as expert as some on this list, but reading the article suggested, 
https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/,
 what you do is this:

- Have one field that takes text as normal
- Copy that field to another field, whose field type uses the 
AutoPhrasingTokenFilter
- Configure your result handler to query against both fields

You don't know the list of synonyms at query time, but now you have another 
field that contains phrases, not words, and so you can indeed use synonym 
matching at query time against this secondary field.   You can even use the 
multi-word phrases in the copied field to suggest to admin users a list of 
candidate synonyms.

-Original Message-
From: Midas A [mailto:test.mi...@gmail.com] 
Sent: Tuesday, November 15, 2016 7:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Multi word synonyms

I am new with solr  . How i should solve this problem ?

Can we do something at query time ?

On Tue, Nov 15, 2016 at 5:35 PM, Vincenzo D'Amore <v.dam...@gmail.com>
wrote:

> Hi Michael,
>
> an update, reading the article I double checked if at least one of the 
> issues were fixed.
> The good news is that 
> https://issues.apache.org/jira/browse/LUCENE-2605
> has
> been closed and is available in 6.2.
>
> On Tue, Nov 15, 2016 at 12:32 PM, Michael Kuhlmann <k...@solr.info> wrote:
>
> > This is a nice reading though, but that solution depends on the 
> > precondition that you'll already know your synonyms at index time.
> >
> > While having synonyms in the index is mostly the better solution 
> > anyway, it's sometimes not feasible.
> >
> > -Michael
> >
> > Am 15.11.2016 um 12:14 schrieb Vincenzo D'Amore:
> > > Hi Midas,
> > >
> > > I suggest this interesting reading:
> > >
> > > https://lucidworks.com/blog/2014/07/12/solution-for-multi-
> > term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> > >
> > >
> > >
> > > On Tue, Nov 15, 2016 at 11:00 AM, Michael Kuhlmann 
> > > <k...@solr.info>
> > wrote:
> > >
> > >> It's not working out of the box, sorry.
> > >>
> > >> We're using this plugin:
> > >> https://github.com/healthonnet/hon-lucene-synonyms#getting-starte
> > >> d
> > >>
> > >> It's working nicely, but can lead to OOME when you add many 
> > >> synonyms with multiple terms. And I'm not sure whether it#s still 
> > >> working with Solr 6.0.
> > >>
> > >> -Michael
> > >>
> > >> Am 15.11.2016 um 10:29 schrieb Midas A:
> > >>> - i have to  use multi word synonyms at query time .
> > >>>
> > >>> Please suggest how can i do it .
> > >>> and let me know it whether it would be visible in debug query or 
> > >>> not
> .
> > >>>
> > >>
> > >
> >
> >
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>


Re: Multi word synonyms

2016-11-15 Thread Michael Kuhlmann
Wow, that's great news! I didn't notice that.

Am 15.11.2016 um 13:05 schrieb Vincenzo D'Amore:
> Hi Michael,
>
> an update, reading the article I double checked if at least one of the
> issues were fixed.
> The good news is that https://issues.apache.org/jira/browse/LUCENE-2605 has
> been closed and is available in 6.2.
>
> On Tue, Nov 15, 2016 at 12:32 PM, Michael Kuhlmann <k...@solr.info> wrote:
>
>> This is a nice reading though, but that solution depends on the
>> precondition that you'll already know your synonyms at index time.
>>
>> While having synonyms in the index is mostly the better solution anyway,
>> it's sometimes not feasible.
>>
>> -Michael
>>
>> Am 15.11.2016 um 12:14 schrieb Vincenzo D'Amore:
>>> Hi Midas,
>>>
>>> I suggest this interesting reading:
>>>
>>> https://lucidworks.com/blog/2014/07/12/solution-for-multi-
>> term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>>>
>>>
>>> On Tue, Nov 15, 2016 at 11:00 AM, Michael Kuhlmann <k...@solr.info>
>> wrote:
>>>> It's not working out of the box, sorry.
>>>>
>>>> We're using this plugin:
>>>> https://github.com/healthonnet/hon-lucene-synonyms#getting-started
>>>>
>>>> It's working nicely, but can lead to OOME when you add many synonyms
>>>> with multiple terms. And I'm not sure whether it#s still working with
>>>> Solr 6.0.
>>>>
>>>> -Michael
>>>>
>>>> Am 15.11.2016 um 10:29 schrieb Midas A:
>>>>> - i have to  use multi word synonyms at query time .
>>>>>
>>>>> Please suggest how can i do it .
>>>>> and let me know it whether it would be visible in debug query or not .
>>>>>
>>
>



Re: Multi word synonyms

2016-11-15 Thread Midas A
I am new with solr  . How i should solve this problem ?

Can we do something at query time ?

On Tue, Nov 15, 2016 at 5:35 PM, Vincenzo D'Amore <v.dam...@gmail.com>
wrote:

> Hi Michael,
>
> an update, reading the article I double checked if at least one of the
> issues were fixed.
> The good news is that https://issues.apache.org/jira/browse/LUCENE-2605
> has
> been closed and is available in 6.2.
>
> On Tue, Nov 15, 2016 at 12:32 PM, Michael Kuhlmann <k...@solr.info> wrote:
>
> > This is a nice reading though, but that solution depends on the
> > precondition that you'll already know your synonyms at index time.
> >
> > While having synonyms in the index is mostly the better solution anyway,
> > it's sometimes not feasible.
> >
> > -Michael
> >
> > Am 15.11.2016 um 12:14 schrieb Vincenzo D'Amore:
> > > Hi Midas,
> > >
> > > I suggest this interesting reading:
> > >
> > > https://lucidworks.com/blog/2014/07/12/solution-for-multi-
> > term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> > >
> > >
> > >
> > > On Tue, Nov 15, 2016 at 11:00 AM, Michael Kuhlmann <k...@solr.info>
> > wrote:
> > >
> > >> It's not working out of the box, sorry.
> > >>
> > >> We're using this plugin:
> > >> https://github.com/healthonnet/hon-lucene-synonyms#getting-started
> > >>
> > >> It's working nicely, but can lead to OOME when you add many synonyms
> > >> with multiple terms. And I'm not sure whether it#s still working with
> > >> Solr 6.0.
> > >>
> > >> -Michael
> > >>
> > >> Am 15.11.2016 um 10:29 schrieb Midas A:
> > >>> - i have to  use multi word synonyms at query time .
> > >>>
> > >>> Please suggest how can i do it .
> > >>> and let me know it whether it would be visible in debug query or not
> .
> > >>>
> > >>
> > >
> >
> >
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>


Re: Multi word synonyms

2016-11-15 Thread Vincenzo D'Amore
Hi Michael,

an update, reading the article I double checked if at least one of the
issues were fixed.
The good news is that https://issues.apache.org/jira/browse/LUCENE-2605 has
been closed and is available in 6.2.

On Tue, Nov 15, 2016 at 12:32 PM, Michael Kuhlmann <k...@solr.info> wrote:

> This is a nice reading though, but that solution depends on the
> precondition that you'll already know your synonyms at index time.
>
> While having synonyms in the index is mostly the better solution anyway,
> it's sometimes not feasible.
>
> -Michael
>
> Am 15.11.2016 um 12:14 schrieb Vincenzo D'Amore:
> > Hi Midas,
> >
> > I suggest this interesting reading:
> >
> > https://lucidworks.com/blog/2014/07/12/solution-for-multi-
> term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> >
> >
> >
> > On Tue, Nov 15, 2016 at 11:00 AM, Michael Kuhlmann <k...@solr.info>
> wrote:
> >
> >> It's not working out of the box, sorry.
> >>
> >> We're using this plugin:
> >> https://github.com/healthonnet/hon-lucene-synonyms#getting-started
> >>
> >> It's working nicely, but can lead to OOME when you add many synonyms
> >> with multiple terms. And I'm not sure whether it#s still working with
> >> Solr 6.0.
> >>
> >> -Michael
> >>
> >> Am 15.11.2016 um 10:29 schrieb Midas A:
> >>> - i have to  use multi word synonyms at query time .
> >>>
> >>> Please suggest how can i do it .
> >>> and let me know it whether it would be visible in debug query or not .
> >>>
> >>
> >
>
>


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Multi word synonyms

2016-11-15 Thread Michael Kuhlmann
This is a nice reading though, but that solution depends on the
precondition that you'll already know your synonyms at index time.

While having synonyms in the index is mostly the better solution anyway,
it's sometimes not feasible.

-Michael

Am 15.11.2016 um 12:14 schrieb Vincenzo D'Amore:
> Hi Midas,
>
> I suggest this interesting reading:
>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>
>
>
> On Tue, Nov 15, 2016 at 11:00 AM, Michael Kuhlmann <k...@solr.info> wrote:
>
>> It's not working out of the box, sorry.
>>
>> We're using this plugin:
>> https://github.com/healthonnet/hon-lucene-synonyms#getting-started
>>
>> It's working nicely, but can lead to OOME when you add many synonyms
>> with multiple terms. And I'm not sure whether it#s still working with
>> Solr 6.0.
>>
>> -Michael
>>
>> Am 15.11.2016 um 10:29 schrieb Midas A:
>>> - i have to  use multi word synonyms at query time .
>>>
>>> Please suggest how can i do it .
>>> and let me know it whether it would be visible in debug query or not .
>>>
>>
>



Re: Multi word synonyms

2016-11-15 Thread Vincenzo D'Amore
Hi Midas,

I suggest this interesting reading:

https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/



On Tue, Nov 15, 2016 at 11:00 AM, Michael Kuhlmann <k...@solr.info> wrote:

> It's not working out of the box, sorry.
>
> We're using this plugin:
> https://github.com/healthonnet/hon-lucene-synonyms#getting-started
>
> It's working nicely, but can lead to OOME when you add many synonyms
> with multiple terms. And I'm not sure whether it#s still working with
> Solr 6.0.
>
> -Michael
>
> Am 15.11.2016 um 10:29 schrieb Midas A:
> > - i have to  use multi word synonyms at query time .
> >
> > Please suggest how can i do it .
> > and let me know it whether it would be visible in debug query or not .
> >
>
>


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Multi word synonyms

2016-11-15 Thread Michael Kuhlmann
It's not working out of the box, sorry.

We're using this plugin:
https://github.com/healthonnet/hon-lucene-synonyms#getting-started

It's working nicely, but can lead to OOME when you add many synonyms
with multiple terms. And I'm not sure whether it#s still working with
Solr 6.0.

-Michael

Am 15.11.2016 um 10:29 schrieb Midas A:
> - i have to  use multi word synonyms at query time .
>
> Please suggest how can i do it .
> and let me know it whether it would be visible in debug query or not .
>



Multi word synonyms

2016-11-15 Thread Midas A
- i have to  use multi word synonyms at query time .

Please suggest how can i do it .
and let me know it whether it would be visible in debug query or not .


Re: Solutions for Multi-word Synonyms

2016-06-24 Thread Joe Lawson
I rounded up some of the discussion here:
http://opensourceconnections.com/blog/2016/06/23/solr-multi-word-synonym-solutions-2016/

Also my colleage pointed me to another project Querqy,
https://github.com/renekrie/querqy which "is a framework for query
preprocessing in Java-based search engines. It comes with a powerful,
rule-based preprocessor named 'Common Rules Preprocessor', which provides
query-time synonyms, query-dependent boosting and down-ranking, and
query-dependent filters. While the Common Rules
Preprocessor is not specific to any search engine, Querqy provides a plugin
to run it within the Solr search engine."

On Fri, Jun 10, 2016 at 2:25 AM, Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:

> As Doug said,
> you should really try to build your own solution for Multi-word Synonyms
> because every need is different and you can customize it for your special
> use case, like adding a Thesaurus.
>
>
> http://www.ub.uni-bielefeld.de/~befehl/base/solr/InsideBase_eurovocThesaurus.html
>
> Regards
> Bernd
>
> Am 09.06.2016 um 17:06 schrieb Doug Turnbull:
> > Mary Jo,
> >
> > Honestly half the time I run into this problem, I end up creating a
> > QParserPlugin because I need to do something specific. With a
> QParserPlugin
> > I can run whatever analysis, slicing and dicing of the query string to
> > manually construct whatever I need to
> >
> >
> http://www.supermind.org/blog/1134/custom-solr-queryparsers-for-fun-and-profit
> >
> > One thing I often do is repeat the functionality of Elasticsearch's match
> > query. Elasticsearch's match query does the following:
> >
> > - Analyze the query string using the field's query-time analyzer
> > - Create an OR query with the tokens that come out of the analysis
> >
> > You can look at the field query parser as something of a starting point
> for
> > this.
> >
> > I usually do this in the context of a boost query, not as the main
> edismax
> > query.
> >
> > If I have time, this is something I've been meaning to open source.
> >
> > Best
> > -Doug
> >
> > On Tue, Jun 7, 2016 at 2:51 PM Joe Lawson <
> jlaw...@opensourceconnections.com>
> > wrote:
> >
> >> I'm sorry I wasn't more specific, I meant we were hijacking the thread
> with
> >> the question, "Anyone used a different method of
> >> handling multi-term synonyms that isn't as global?" as the original
> thread
> >> was about getting synonym_edismax running.
> >>
> >> On Tue, Jun 7, 2016 at 2:24 PM, MaryJo Sminkey <mjsmin...@gmail.com>
> >> wrote:
> >>
> >>>> MaryJo you might want to start a new thread, I think we kinda hijacked
> >>> this
> >>>> one. Also if you are interested in tuning queries check out
> >>>> http://splainer.io/ and https://www.quepid.com which are interactive
> >>> tools
> >>>> (both of which my company makes) to tune for search relevancy.
> >>>>
> >>>
> >>>
> >>> Okay I changed the subject. But I don't need a tuning tool, I already
> >> know
> >>> WHY I'm not getting the results I need, the problem is how to fix it or
> >> get
> >>> around what the plugin is doing. Which is why I was inquiring if people
> >>> have had success with something other than this particularly plugin for
> >>> more advanced queries that it messes around with. It seems to do a good
> >> job
> >>> if you aren't doing anything particularly complicated with your search
> >>> logic, but I don't see a good way to solve the issue I'm having, and a
> >>> tuning tool isn't really going to help with that. We were pretty happy
> >> with
> >>> our search relevancy for the most part *other* than the problem with
> the
> >>> multi-term synonyms not working reliably but I definitely can't lose
> >>> relevancy that we had just to get those working.
> >>>
> >>> In reviewing your tools previously, the problem as I recall is that
> they
> >>> rely on querying Solr directly, while our searches go through multiple
> >>> levels of an application which includes a lot of additional logic in
> >> terms
> >>> of what the data that gets sent to Solr are, so they just aren't going
> to
> >>> be much use for us. It was easier for me to just write my own tool that
> >>> essentially does the same kind of thing, but with my application logic
> >>> built in.
> >>>
> >>> Mary Jo
> >>>
> >>
> >
>
> --
> *
> Bernd FehlingBielefeld University Library
> Dipl.-Inform. (FH)LibTec - Library Technology
> Universitätsstr. 25  and Knowledge Management
> 33615 Bielefeld
> Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de
>
> BASE - Bielefeld Academic Search Engine - www.base-search.net
> *
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-23 Thread Joe Lawson
FYI everyone, I've updated the README.md to be fully up to date for Solr
6.0 and the latest plugin release.
https://github.com/healthonnet/hon-lucene-synonyms/blob/master/README.md

On Fri, Jun 17, 2016 at 2:34 PM, MaryJo Sminkey  wrote:

> > OK - Slapping forehead now... D'oh!
> >
> > 1.2 >
> > Float, not int!
> >
>
>
> LOL, we've all been there. I'm surprised I didn't notice that myself.
>
> MJ
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-17 Thread MaryJo Sminkey
> OK - Slapping forehead now... D'oh!
>
> 1.2
> Float, not int!
>


LOL, we've all been there. I'm surprised I didn't notice that myself.

MJ


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-17 Thread John Bickerstaff
OK - Slapping forehead now... D'oh!

1.2 wrote:

> Hi all -
>
> I've successfully run the hon-lucene-synonyms plugin from the Admin
> console by adding the following to the Raw Query Parameters field...
>
>
> =text=synonym_edismax=true=1.2=1.1
>
> I got those from the Read Me on the github account.
>
> Now I'm trying to make this work via a requestHandler in solrconfig.xml.
>
> I think the following should work, but it just hangs if I add the last
> line referencing synonyms.originalBoost
>
> 
> 
>  
>explicit
>10
>synonym_edismax
>text
>true
>1.2 --> If I add this
> line, the admin console just hangs when I hit /test1
>  
>  
>
> If I do NOT add the last line and only have the line that sets
> synonyms=true, it appears to work fine.
>
> I see the dot notation all over the sample entries in solrconfig.xml...
> Am I missing something here?
>
> Essentially, how do I get these variables set correctly from inside a
> requestHandler configured in the solrconfig.xml file?
>
> On Tue, Jun 7, 2016 at 11:47 AM, Joe Lawson <
> jlaw...@opensourceconnections.com> wrote:
>
>> MaryJo you might want to start a new thread, I think we kinda hijacked
>> this
>> one. Also if you are interested in tuning queries check out
>> http://splainer.io/ and https://www.quepid.com which are interactive
>> tools
>> (both of which my company makes) to tune for search relevancy.
>>
>> On Tue, Jun 7, 2016 at 1:45 PM, MaryJo Sminkey <mjsmin...@gmail.com>
>> wrote:
>>
>> > I'm really thinking this just might not be the right tool for us, what
>> we
>> > really need is a solution that works like the normal synonym filter
>> does,
>> > just with proper multi-term support, so I can apply the synonyms only on
>> > certain fields (copied fields) that have their own, lower boost
>> settings.
>> > The way this plugin works across the entire query just seems too
>> > problematic when you need to do complex queries with lots of different
>> > boost settings to get good relevancy. Anyone used a different method of
>> > handling multi-term synonyms that isn't as global?
>> >
>> > Mary Jo
>> >
>> >
>> >
>> > On Tue, Jun 7, 2016 at 1:31 PM, MaryJo Sminkey <mjsmin...@gmail.com>
>> > wrote:
>> >
>> > > Here's the issue I am still having with getting the right search
>> > relevancy
>> > > with the synonym plugin in place. We typically have users searching on
>> > > multiple terms, and we want matches across multiple terms,
>> particularly
>> > > those that appears as phrases, to appear higher than matches for the
>> same
>> > > term multiple times. The synonym filter makes this complicated since
>> we
>> > may
>> > > have cases where the term the user enters, like "sbc", maps to a
>> > multi-term
>> > > synonym like "small block", and we always want the matches for the
>> > original
>> > > term to pop up first, so I'm trying to make sure the original boost is
>> > high
>> > > enough to override a phrase boost that the multi-term synonym would
>> give.
>> > > Unfortunately this then means matches on the same term multiple times
>> get
>> > > pushed up over my phrase matches...those aren't going to be the most
>> > > relevant matches. Not sure there's a way to solve this successfully,
>> > > without a completely different approach to the synonyms... or not
>> > counting
>> > > the number of matches on terms (I assume you can drop that ability,
>> > > although that's not ideal either...just better than what I have now).
>> > >
>> > > MJ
>> > >
>> > >
>> > >
>> > > Sent with MailTrack
>> > > <
>> >
>> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
>> > >
>> > >
>> > > On Mon, Jun 6, 2016 at 9:39 PM, MaryJo Sminkey <mjsmin...@gmail.com>
>> > > wrote:
>> > >
>> > >>
>> > >> On Mon, Jun 6, 2016 at 7:36 PM, Joe Lawson <
>> > >> jlaw...@opensourceconnections.com> wrote:
>> > >>
>> > >>>
>> > >>> We were thinking, as you experimented with, that the 0.5 and 2.0
>> boosts
>> > >>> were no match for the product name and keyword field boosts so that
>> >

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-17 Thread MaryJo Sminkey
On Fri, Jun 17, 2016 at 2:15 PM, John Bickerstaff 
wrote:

> If I do NOT add the last line and only have the line that sets
> synonyms=true, it appears to work fine.
>
> I see the dot notation all over the sample entries in solrconfig.xml...  Am
> I missing something here?
>
> Essentially, how do I get these variables set correctly from inside a
> requestHandler configured in the solrconfig.xml file?
>


I know I didn't have any issues using those boosts but I was sending them
on the query string (or otherwise as part of my query request), rather than
setting them in the config. You might try that to see if it makes a
difference.

Mary Jo


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-17 Thread John Bickerstaff
Hi all -

I've successfully run the hon-lucene-synonyms plugin from the Admin console
by adding the following to the Raw Query Parameters field...

=text=synonym_edismax=true=1.2=1.1

I got those from the Read Me on the github account.

Now I'm trying to make this work via a requestHandler in solrconfig.xml.

I think the following should work, but it just hangs if I add the last line
referencing synonyms.originalBoost



 
   explicit
   10
   synonym_edismax
   text
   true
   1.2 --> If I add this line,
the admin console just hangs when I hit /test1
 
 

If I do NOT add the last line and only have the line that sets
synonyms=true, it appears to work fine.

I see the dot notation all over the sample entries in solrconfig.xml...  Am
I missing something here?

Essentially, how do I get these variables set correctly from inside a
requestHandler configured in the solrconfig.xml file?

On Tue, Jun 7, 2016 at 11:47 AM, Joe Lawson <
jlaw...@opensourceconnections.com> wrote:

> MaryJo you might want to start a new thread, I think we kinda hijacked this
> one. Also if you are interested in tuning queries check out
> http://splainer.io/ and https://www.quepid.com which are interactive tools
> (both of which my company makes) to tune for search relevancy.
>
> On Tue, Jun 7, 2016 at 1:45 PM, MaryJo Sminkey <mjsmin...@gmail.com>
> wrote:
>
> > I'm really thinking this just might not be the right tool for us, what we
> > really need is a solution that works like the normal synonym filter does,
> > just with proper multi-term support, so I can apply the synonyms only on
> > certain fields (copied fields) that have their own, lower boost settings.
> > The way this plugin works across the entire query just seems too
> > problematic when you need to do complex queries with lots of different
> > boost settings to get good relevancy. Anyone used a different method of
> > handling multi-term synonyms that isn't as global?
> >
> > Mary Jo
> >
> >
> >
> > On Tue, Jun 7, 2016 at 1:31 PM, MaryJo Sminkey <mjsmin...@gmail.com>
> > wrote:
> >
> > > Here's the issue I am still having with getting the right search
> > relevancy
> > > with the synonym plugin in place. We typically have users searching on
> > > multiple terms, and we want matches across multiple terms, particularly
> > > those that appears as phrases, to appear higher than matches for the
> same
> > > term multiple times. The synonym filter makes this complicated since we
> > may
> > > have cases where the term the user enters, like "sbc", maps to a
> > multi-term
> > > synonym like "small block", and we always want the matches for the
> > original
> > > term to pop up first, so I'm trying to make sure the original boost is
> > high
> > > enough to override a phrase boost that the multi-term synonym would
> give.
> > > Unfortunately this then means matches on the same term multiple times
> get
> > > pushed up over my phrase matches...those aren't going to be the most
> > > relevant matches. Not sure there's a way to solve this successfully,
> > > without a completely different approach to the synonyms... or not
> > counting
> > > the number of matches on terms (I assume you can drop that ability,
> > > although that's not ideal either...just better than what I have now).
> > >
> > > MJ
> > >
> > >
> > >
> > > Sent with MailTrack
> > > <
> >
> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
> > >
> > >
> > > On Mon, Jun 6, 2016 at 9:39 PM, MaryJo Sminkey <mjsmin...@gmail.com>
> > > wrote:
> > >
> > >>
> > >> On Mon, Jun 6, 2016 at 7:36 PM, Joe Lawson <
> > >> jlaw...@opensourceconnections.com> wrote:
> > >>
> > >>>
> > >>> We were thinking, as you experimented with, that the 0.5 and 2.0
> boosts
> > >>> were no match for the product name and keyword field boosts so that
> > would
> > >>> influence your search as well.
> > >>
> > >>
> > >>
> > >> Yeah I definitely will have to play with the values a bit as we want
> the
> > >> product name matches to always appear highest, whether original or
> > >> synonyms, but I'll have to figure out how to get that result without
> one
> > >> word terms that have multi word synonyms getting overly boosted for a
> > >> phrase match while still sufficiently boosting the normal phrase
> > match
> > >> stuff too. With the normal synonym filter I was able to just copy
> fields
> > >> that could have synonyms to a new field (which would be the only one
> > with
> > >> the synonym filter), and use a different, lower boost on those fields,
> > but
> > >> that won't work with this plugin which applies across everything in
> the
> > >> query. Makes it a bit more complicated to get everything just right.
> > >>
> > >> MJ
> > >>
> > >>
> > >> Sent with MailTrack
> > >> <
> >
> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
> > >
> > >>
> > >
> > >
> >
>


Re: Solutions for Multi-word Synonyms

2016-06-10 Thread Bernd Fehling
As Doug said,
you should really try to build your own solution for Multi-word Synonyms
because every need is different and you can customize it for your special
use case, like adding a Thesaurus.

http://www.ub.uni-bielefeld.de/~befehl/base/solr/InsideBase_eurovocThesaurus.html

Regards
Bernd

Am 09.06.2016 um 17:06 schrieb Doug Turnbull:
> Mary Jo,
> 
> Honestly half the time I run into this problem, I end up creating a
> QParserPlugin because I need to do something specific. With a QParserPlugin
> I can run whatever analysis, slicing and dicing of the query string to
> manually construct whatever I need to
> 
> http://www.supermind.org/blog/1134/custom-solr-queryparsers-for-fun-and-profit
> 
> One thing I often do is repeat the functionality of Elasticsearch's match
> query. Elasticsearch's match query does the following:
> 
> - Analyze the query string using the field's query-time analyzer
> - Create an OR query with the tokens that come out of the analysis
> 
> You can look at the field query parser as something of a starting point for
> this.
> 
> I usually do this in the context of a boost query, not as the main edismax
> query.
> 
> If I have time, this is something I've been meaning to open source.
> 
> Best
> -Doug
> 
> On Tue, Jun 7, 2016 at 2:51 PM Joe Lawson <jlaw...@opensourceconnections.com>
> wrote:
> 
>> I'm sorry I wasn't more specific, I meant we were hijacking the thread with
>> the question, "Anyone used a different method of
>> handling multi-term synonyms that isn't as global?" as the original thread
>> was about getting synonym_edismax running.
>>
>> On Tue, Jun 7, 2016 at 2:24 PM, MaryJo Sminkey <mjsmin...@gmail.com>
>> wrote:
>>
>>>> MaryJo you might want to start a new thread, I think we kinda hijacked
>>> this
>>>> one. Also if you are interested in tuning queries check out
>>>> http://splainer.io/ and https://www.quepid.com which are interactive
>>> tools
>>>> (both of which my company makes) to tune for search relevancy.
>>>>
>>>
>>>
>>> Okay I changed the subject. But I don't need a tuning tool, I already
>> know
>>> WHY I'm not getting the results I need, the problem is how to fix it or
>> get
>>> around what the plugin is doing. Which is why I was inquiring if people
>>> have had success with something other than this particularly plugin for
>>> more advanced queries that it messes around with. It seems to do a good
>> job
>>> if you aren't doing anything particularly complicated with your search
>>> logic, but I don't see a good way to solve the issue I'm having, and a
>>> tuning tool isn't really going to help with that. We were pretty happy
>> with
>>> our search relevancy for the most part *other* than the problem with the
>>> multi-term synonyms not working reliably but I definitely can't lose
>>> relevancy that we had just to get those working.
>>>
>>> In reviewing your tools previously, the problem as I recall is that they
>>> rely on querying Solr directly, while our searches go through multiple
>>> levels of an application which includes a lot of additional logic in
>> terms
>>> of what the data that gets sent to Solr are, so they just aren't going to
>>> be much use for us. It was easier for me to just write my own tool that
>>> essentially does the same kind of thing, but with my application logic
>>> built in.
>>>
>>> Mary Jo
>>>
>>
> 

-- 
*
Bernd FehlingBielefeld University Library
Dipl.-Inform. (FH)LibTec - Library Technology
Universitätsstr. 25  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: Solutions for Multi-word Synonyms

2016-06-09 Thread MaryJo Sminkey
Thanks, added my vote (which threw an error but looks like it did get
added).

MJ



On Thu, Jun 9, 2016 at 5:41 PM, Upayavira  wrote:

> Here's a recently created ticket that covers this issue:
>
> https://issues.apache.org/jira/browse/SOLR-9185
>
> Let's hope we see some traction on it soon, as many people suffer from
> this issue.
>
> Upayavira
>
> On Thu, 9 Jun 2016, at 09:10 PM, MaryJo Sminkey wrote:
> > On Thu, Jun 9, 2016 at 1:50 PM, Joe Lawson <
> > jlaw...@opensourceconnections.com> wrote:
> >
> > > The auth-phrasing-token (APT) filter is a two pronged solution that
> > > requires index and query time processes versus hon-lucene-synonyms
> (HLS)
> > > which is strictly a query time implementation. The primary take away
> from
> > > that is, APT requires reindexing your data when you update the
> autophrases
> > > and synonyms while HLS does not.
> > >
> >
> >
> > Yup, understood about the indexing, that is not a big issue for us as we
> > rarely change the synonym list and re-index frequently.
> >
> > MJ
> >
> >
> > Sent with MailTrack
> > <
> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
> >
>


Re: Solutions for Multi-word Synonyms

2016-06-09 Thread Upayavira
Here's a recently created ticket that covers this issue:

https://issues.apache.org/jira/browse/SOLR-9185

Let's hope we see some traction on it soon, as many people suffer from
this issue.

Upayavira

On Thu, 9 Jun 2016, at 09:10 PM, MaryJo Sminkey wrote:
> On Thu, Jun 9, 2016 at 1:50 PM, Joe Lawson <
> jlaw...@opensourceconnections.com> wrote:
> 
> > The auth-phrasing-token (APT) filter is a two pronged solution that
> > requires index and query time processes versus hon-lucene-synonyms (HLS)
> > which is strictly a query time implementation. The primary take away from
> > that is, APT requires reindexing your data when you update the autophrases
> > and synonyms while HLS does not.
> >
> 
> 
> Yup, understood about the indexing, that is not a big issue for us as we
> rarely change the synonym list and re-index frequently.
> 
> MJ
> 
> 
> Sent with MailTrack
> 


Re: Solutions for Multi-word Synonyms

2016-06-09 Thread MaryJo Sminkey
On Thu, Jun 9, 2016 at 1:50 PM, Joe Lawson <
jlaw...@opensourceconnections.com> wrote:

> The auth-phrasing-token (APT) filter is a two pronged solution that
> requires index and query time processes versus hon-lucene-synonyms (HLS)
> which is strictly a query time implementation. The primary take away from
> that is, APT requires reindexing your data when you update the autophrases
> and synonyms while HLS does not.
>


Yup, understood about the indexing, that is not a big issue for us as we
rarely change the synonym list and re-index frequently.

MJ


Sent with MailTrack



Re: Solutions for Multi-word Synonyms

2016-06-09 Thread Joe Lawson
>
> I'm wondering if anyone has experience using the autophrasing solution on
> the Lucidworks blog:
>
>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>
>
The auth-phrasing-token (APT) filter is a two pronged solution that
requires index and query time processes versus hon-lucene-synonyms (HLS)
which is strictly a query time implementation. The primary take away from
that is, APT requires reindexing your data when you update the autophrases
and synonyms while HLS does not.

APT is more precise while HLS is more flexible.

-Joe


Re: Solutions for Multi-word Synonyms

2016-06-09 Thread MaryJo Sminkey
On Thu, Jun 9, 2016 at 11:06 AM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Honestly half the time I run into this problem, I end up creating a
> QParserPlugin because I need to do something specific. With a QParserPlugin
> I can run whatever analysis, slicing and dicing of the query string to
> manually construct whatever I need to
>
>
> http://www.supermind.org/blog/1134/custom-solr-queryparsers-for-fun-and-profit
>
> One thing I often do is repeat the functionality of Elasticsearch's match
> query. Elasticsearch's match query does the following:
>


Thanks Doug... I was surprised at the lack of response on this as it seems
like it would be a lot more common issue. Looking over that page though, I
am not sure I would be able to figure out how to do that kind of custom
query parser on my own, without something fairly similar in respect to
adding synonym support to work from. I'm just a lowly self-taught web
developer after all, not a java programmer or someone with a lot of
experience writing source code, etc.

We did consider switching to ElasticSearch due to its support out of the
box for multi-term synonyms, but that would be a lot of work, and I'm not
sure it can support everything else we are doing, like all the nested
facets and grouping, etc. and it would take a fair amount of work to
convert everything we have to the point of finding that out.

I'm wondering if anyone has experience using the autophrasing solution on
the Lucidworks blog:

https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/

I know I tried this one as well some months ago and couldn't seem to get it
to work but it's probably the one I'll be trying next and hopefully can
figure it out this time. Since it works as a filter, it should work better
for us in terms of being able to apply it selectively only to certain
fields.


Sent with MailTrack



Re: Solutions for Multi-word Synonyms

2016-06-09 Thread Doug Turnbull
Mary Jo,

Honestly half the time I run into this problem, I end up creating a
QParserPlugin because I need to do something specific. With a QParserPlugin
I can run whatever analysis, slicing and dicing of the query string to
manually construct whatever I need to

http://www.supermind.org/blog/1134/custom-solr-queryparsers-for-fun-and-profit

One thing I often do is repeat the functionality of Elasticsearch's match
query. Elasticsearch's match query does the following:

- Analyze the query string using the field's query-time analyzer
- Create an OR query with the tokens that come out of the analysis

You can look at the field query parser as something of a starting point for
this.

I usually do this in the context of a boost query, not as the main edismax
query.

If I have time, this is something I've been meaning to open source.

Best
-Doug

On Tue, Jun 7, 2016 at 2:51 PM Joe Lawson 
wrote:

> I'm sorry I wasn't more specific, I meant we were hijacking the thread with
> the question, "Anyone used a different method of
> handling multi-term synonyms that isn't as global?" as the original thread
> was about getting synonym_edismax running.
>
> On Tue, Jun 7, 2016 at 2:24 PM, MaryJo Sminkey 
> wrote:
>
> > > MaryJo you might want to start a new thread, I think we kinda hijacked
> > this
> > > one. Also if you are interested in tuning queries check out
> > > http://splainer.io/ and https://www.quepid.com which are interactive
> > tools
> > > (both of which my company makes) to tune for search relevancy.
> > >
> >
> >
> > Okay I changed the subject. But I don't need a tuning tool, I already
> know
> > WHY I'm not getting the results I need, the problem is how to fix it or
> get
> > around what the plugin is doing. Which is why I was inquiring if people
> > have had success with something other than this particularly plugin for
> > more advanced queries that it messes around with. It seems to do a good
> job
> > if you aren't doing anything particularly complicated with your search
> > logic, but I don't see a good way to solve the issue I'm having, and a
> > tuning tool isn't really going to help with that. We were pretty happy
> with
> > our search relevancy for the most part *other* than the problem with the
> > multi-term synonyms not working reliably but I definitely can't lose
> > relevancy that we had just to get those working.
> >
> > In reviewing your tools previously, the problem as I recall is that they
> > rely on querying Solr directly, while our searches go through multiple
> > levels of an application which includes a lot of additional logic in
> terms
> > of what the data that gets sent to Solr are, so they just aren't going to
> > be much use for us. It was easier for me to just write my own tool that
> > essentially does the same kind of thing, but with my application logic
> > built in.
> >
> > Mary Jo
> >
>


Re: Solutions for Multi-word Synonyms

2016-06-07 Thread Joe Lawson
I'm sorry I wasn't more specific, I meant we were hijacking the thread with
the question, "Anyone used a different method of
handling multi-term synonyms that isn't as global?" as the original thread
was about getting synonym_edismax running.

On Tue, Jun 7, 2016 at 2:24 PM, MaryJo Sminkey  wrote:

> > MaryJo you might want to start a new thread, I think we kinda hijacked
> this
> > one. Also if you are interested in tuning queries check out
> > http://splainer.io/ and https://www.quepid.com which are interactive
> tools
> > (both of which my company makes) to tune for search relevancy.
> >
>
>
> Okay I changed the subject. But I don't need a tuning tool, I already know
> WHY I'm not getting the results I need, the problem is how to fix it or get
> around what the plugin is doing. Which is why I was inquiring if people
> have had success with something other than this particularly plugin for
> more advanced queries that it messes around with. It seems to do a good job
> if you aren't doing anything particularly complicated with your search
> logic, but I don't see a good way to solve the issue I'm having, and a
> tuning tool isn't really going to help with that. We were pretty happy with
> our search relevancy for the most part *other* than the problem with the
> multi-term synonyms not working reliably but I definitely can't lose
> relevancy that we had just to get those working.
>
> In reviewing your tools previously, the problem as I recall is that they
> rely on querying Solr directly, while our searches go through multiple
> levels of an application which includes a lot of additional logic in terms
> of what the data that gets sent to Solr are, so they just aren't going to
> be much use for us. It was easier for me to just write my own tool that
> essentially does the same kind of thing, but with my application logic
> built in.
>
> Mary Jo
>


Solutions for Multi-word Synonyms

2016-06-07 Thread MaryJo Sminkey
> MaryJo you might want to start a new thread, I think we kinda hijacked this
> one. Also if you are interested in tuning queries check out
> http://splainer.io/ and https://www.quepid.com which are interactive tools
> (both of which my company makes) to tune for search relevancy.
>


Okay I changed the subject. But I don't need a tuning tool, I already know
WHY I'm not getting the results I need, the problem is how to fix it or get
around what the plugin is doing. Which is why I was inquiring if people
have had success with something other than this particularly plugin for
more advanced queries that it messes around with. It seems to do a good job
if you aren't doing anything particularly complicated with your search
logic, but I don't see a good way to solve the issue I'm having, and a
tuning tool isn't really going to help with that. We were pretty happy with
our search relevancy for the most part *other* than the problem with the
multi-term synonyms not working reliably but I definitely can't lose
relevancy that we had just to get those working.

In reviewing your tools previously, the problem as I recall is that they
rely on querying Solr directly, while our searches go through multiple
levels of an application which includes a lot of additional logic in terms
of what the data that gets sent to Solr are, so they just aren't going to
be much use for us. It was easier for me to just write my own tool that
essentially does the same kind of thing, but with my application logic
built in.

Mary Jo


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-07 Thread Joe Lawson
MaryJo you might want to start a new thread, I think we kinda hijacked this
one. Also if you are interested in tuning queries check out
http://splainer.io/ and https://www.quepid.com which are interactive tools
(both of which my company makes) to tune for search relevancy.

On Tue, Jun 7, 2016 at 1:45 PM, MaryJo Sminkey <mjsmin...@gmail.com> wrote:

> I'm really thinking this just might not be the right tool for us, what we
> really need is a solution that works like the normal synonym filter does,
> just with proper multi-term support, so I can apply the synonyms only on
> certain fields (copied fields) that have their own, lower boost settings.
> The way this plugin works across the entire query just seems too
> problematic when you need to do complex queries with lots of different
> boost settings to get good relevancy. Anyone used a different method of
> handling multi-term synonyms that isn't as global?
>
> Mary Jo
>
>
>
> On Tue, Jun 7, 2016 at 1:31 PM, MaryJo Sminkey <mjsmin...@gmail.com>
> wrote:
>
> > Here's the issue I am still having with getting the right search
> relevancy
> > with the synonym plugin in place. We typically have users searching on
> > multiple terms, and we want matches across multiple terms, particularly
> > those that appears as phrases, to appear higher than matches for the same
> > term multiple times. The synonym filter makes this complicated since we
> may
> > have cases where the term the user enters, like "sbc", maps to a
> multi-term
> > synonym like "small block", and we always want the matches for the
> original
> > term to pop up first, so I'm trying to make sure the original boost is
> high
> > enough to override a phrase boost that the multi-term synonym would give.
> > Unfortunately this then means matches on the same term multiple times get
> > pushed up over my phrase matches...those aren't going to be the most
> > relevant matches. Not sure there's a way to solve this successfully,
> > without a completely different approach to the synonyms... or not
> counting
> > the number of matches on terms (I assume you can drop that ability,
> > although that's not ideal either...just better than what I have now).
> >
> > MJ
> >
> >
> >
> > Sent with MailTrack
> > <
> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
> >
> >
> > On Mon, Jun 6, 2016 at 9:39 PM, MaryJo Sminkey <mjsmin...@gmail.com>
> > wrote:
> >
> >>
> >> On Mon, Jun 6, 2016 at 7:36 PM, Joe Lawson <
> >> jlaw...@opensourceconnections.com> wrote:
> >>
> >>>
> >>> We were thinking, as you experimented with, that the 0.5 and 2.0 boosts
> >>> were no match for the product name and keyword field boosts so that
> would
> >>> influence your search as well.
> >>
> >>
> >>
> >> Yeah I definitely will have to play with the values a bit as we want the
> >> product name matches to always appear highest, whether original or
> >> synonyms, but I'll have to figure out how to get that result without one
> >> word terms that have multi word synonyms getting overly boosted for a
> >> phrase match while still sufficiently boosting the normal phrase
> match
> >> stuff too. With the normal synonym filter I was able to just copy fields
> >> that could have synonyms to a new field (which would be the only one
> with
> >> the synonym filter), and use a different, lower boost on those fields,
> but
> >> that won't work with this plugin which applies across everything in the
> >> query. Makes it a bit more complicated to get everything just right.
> >>
> >> MJ
> >>
> >>
> >> Sent with MailTrack
> >> <
> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
> >
> >>
> >
> >
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-07 Thread MaryJo Sminkey
I'm really thinking this just might not be the right tool for us, what we
really need is a solution that works like the normal synonym filter does,
just with proper multi-term support, so I can apply the synonyms only on
certain fields (copied fields) that have their own, lower boost settings.
The way this plugin works across the entire query just seems too
problematic when you need to do complex queries with lots of different
boost settings to get good relevancy. Anyone used a different method of
handling multi-term synonyms that isn't as global?

Mary Jo



On Tue, Jun 7, 2016 at 1:31 PM, MaryJo Sminkey <mjsmin...@gmail.com> wrote:

> Here's the issue I am still having with getting the right search relevancy
> with the synonym plugin in place. We typically have users searching on
> multiple terms, and we want matches across multiple terms, particularly
> those that appears as phrases, to appear higher than matches for the same
> term multiple times. The synonym filter makes this complicated since we may
> have cases where the term the user enters, like "sbc", maps to a multi-term
> synonym like "small block", and we always want the matches for the original
> term to pop up first, so I'm trying to make sure the original boost is high
> enough to override a phrase boost that the multi-term synonym would give.
> Unfortunately this then means matches on the same term multiple times get
> pushed up over my phrase matches...those aren't going to be the most
> relevant matches. Not sure there's a way to solve this successfully,
> without a completely different approach to the synonyms... or not counting
> the number of matches on terms (I assume you can drop that ability,
> although that's not ideal either...just better than what I have now).
>
> MJ
>
>
>
> Sent with MailTrack
> <https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22>
>
> On Mon, Jun 6, 2016 at 9:39 PM, MaryJo Sminkey <mjsmin...@gmail.com>
> wrote:
>
>>
>> On Mon, Jun 6, 2016 at 7:36 PM, Joe Lawson <
>> jlaw...@opensourceconnections.com> wrote:
>>
>>>
>>> We were thinking, as you experimented with, that the 0.5 and 2.0 boosts
>>> were no match for the product name and keyword field boosts so that would
>>> influence your search as well.
>>
>>
>>
>> Yeah I definitely will have to play with the values a bit as we want the
>> product name matches to always appear highest, whether original or
>> synonyms, but I'll have to figure out how to get that result without one
>> word terms that have multi word synonyms getting overly boosted for a
>> phrase match while still sufficiently boosting the normal phrase match
>> stuff too. With the normal synonym filter I was able to just copy fields
>> that could have synonyms to a new field (which would be the only one with
>> the synonym filter), and use a different, lower boost on those fields, but
>> that won't work with this plugin which applies across everything in the
>> query. Makes it a bit more complicated to get everything just right.
>>
>> MJ
>>
>>
>> Sent with MailTrack
>> <https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22>
>>
>
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-07 Thread MaryJo Sminkey
Here's the issue I am still having with getting the right search relevancy
with the synonym plugin in place. We typically have users searching on
multiple terms, and we want matches across multiple terms, particularly
those that appears as phrases, to appear higher than matches for the same
term multiple times. The synonym filter makes this complicated since we may
have cases where the term the user enters, like "sbc", maps to a multi-term
synonym like "small block", and we always want the matches for the original
term to pop up first, so I'm trying to make sure the original boost is high
enough to override a phrase boost that the multi-term synonym would give.
Unfortunately this then means matches on the same term multiple times get
pushed up over my phrase matches...those aren't going to be the most
relevant matches. Not sure there's a way to solve this successfully,
without a completely different approach to the synonyms... or not counting
the number of matches on terms (I assume you can drop that ability,
although that's not ideal either...just better than what I have now).

MJ



Sent with MailTrack
<https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22>

On Mon, Jun 6, 2016 at 9:39 PM, MaryJo Sminkey <mjsmin...@gmail.com> wrote:

>
> On Mon, Jun 6, 2016 at 7:36 PM, Joe Lawson <
> jlaw...@opensourceconnections.com> wrote:
>
>>
>> We were thinking, as you experimented with, that the 0.5 and 2.0 boosts
>> were no match for the product name and keyword field boosts so that would
>> influence your search as well.
>
>
>
> Yeah I definitely will have to play with the values a bit as we want the
> product name matches to always appear highest, whether original or
> synonyms, but I'll have to figure out how to get that result without one
> word terms that have multi word synonyms getting overly boosted for a
> phrase match while still sufficiently boosting the normal phrase match
> stuff too. With the normal synonym filter I was able to just copy fields
> that could have synonyms to a new field (which would be the only one with
> the synonym filter), and use a different, lower boost on those fields, but
> that won't work with this plugin which applies across everything in the
> query. Makes it a bit more complicated to get everything just right.
>
> MJ
>
>
> Sent with MailTrack
> <https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22>
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-06 Thread MaryJo Sminkey
On Mon, Jun 6, 2016 at 7:36 PM, Joe Lawson <
jlaw...@opensourceconnections.com> wrote:

>
> We were thinking, as you experimented with, that the 0.5 and 2.0 boosts
> were no match for the product name and keyword field boosts so that would
> influence your search as well.



Yeah I definitely will have to play with the values a bit as we want the
product name matches to always appear highest, whether original or
synonyms, but I'll have to figure out how to get that result without one
word terms that have multi word synonyms getting overly boosted for a
phrase match while still sufficiently boosting the normal phrase match
stuff too. With the normal synonym filter I was able to just copy fields
that could have synonyms to a new field (which would be the only one with
the synonym filter), and use a different, lower boost on those fields, but
that won't work with this plugin which applies across everything in the
query. Makes it a bit more complicated to get everything just right.

MJ


Sent with MailTrack
<https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-06 Thread Joe Lawson
Yeah I thought the scale of the boosts were off as well but got caught up
verifying that the plugin was working. My colleague suggested that it could
be that because small block is a phrase that it would get a higher score in
matching because you basically get a phrase match each time which causes it
to float to the top. You should check out his post about Solr's latest
score engine. It explains the notion of TF*IDF which drives almost all the
theory in information retrieval (aka search).

http://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/

We were thinking, as you experimented with, that the 0.5 and 2.0 boosts
were no match for the product name and keyword field boosts so that would
influence your search as well.
On Jun 6, 2016 6:03 PM, "MaryJo Sminkey"  wrote:

> Oh thanks, yeah I did miss that one field which had a parent type with the
> normal synonym filter. However, that's our product SKU field so really
> doesn't even come into play. I verified that none of the other fields have
> a synonym filter set and even removed the productumbertext just to make
> sure it wasn't doing anything. I was still getting the same results, the
> matches with "SBC" in the name are buried under the "small block" matches.
> After thinking over the issue, I realized what the solution was, I just
> needed to set the synonym.originalBoost high enough that it would be higher
> than the boosts provided by the phrase boosting, which is clearly what was
> letting "small block" jump ahead of "sbc". So I bumped that up to 100
> leaving the synonymBoost at 1 and now I'm getting the results I'm looking
> for.
>
> Thanks for the help!
>
> Mary Jo
>
> Sent with MailTrack
> <
> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
> >
>
> On Mon, Jun 6, 2016 at 4:57 PM, Joe Lawson <
> jlaw...@opensourceconnections.com> wrote:
>
> > Mary Jo.
> >
> > It appears to be working correctly but you have a very complex query
> going
> > on so it can be confusing. Assuming you are using the queryParser as
> > provided in examples your query would look like "+sbc" when it enters the
> > queryParser and would look like "+((sbc)^2.0 (sb)^0.5 (small block)^0.5)"
> > when it came out and then it would enter the normal pipeline and
> everything
> > would be processed as individual tokens.
> >
> > It appears that you have synonyms being processed at query time on the
> > prodnumbertext field. For example when (sbc)^2.0 enters into the normal
> > query stage then have all the qf, pf, ps and tie modifies added so the
> > first one turns into something like
> >
> > "(body:sbc^0.5 | productinfo:sbc^1.0 | keywords:sbc^2.0 |
> prodname:sbc^10.0
> > | prodnumbertext:sbc^20.0)^2.0"
> >
> > Then the query time synonym expansion on produnumbertext combined with a
> > phrase and default mm being 100% (
> >
> >
> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser#TheDisMaxQueryParser-Themm(MinimumShouldMatch)Parameter
> > )
> > you end up with query being
> >
> > (((prodnumbertext:sbc prodnumbertext:sb prodnumbertext:small)
> > prodnumbertext:block)~2)^20.0
> >
> > The ~2 comes from mm=100% and having the phrase "small block" as a
> synonym.
> > This messes up your results as well as anything in prodnumbertext will
> have
> > to match "sbc block" "sb block" or "small block" which of course is only
> > going to match small block. Check out the section "Multi-work synonyms
> > won't work as phrase queries" in
> > https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ for
> > more info.
> >
> > Advice: make sure on the schema that none of the fields your are running
> > queries against do any complex query operations, especially make sure
> they
> > aren't doing additional synonym resolution against the same file.
> >
> > I think you are getting hit by the MM bug.  Try tuning it way down to
> > something like 0.01% and see how the matches go.
> >
> >
> >
> > On Fri, Jun 3, 2016 at 2:21 PM, MaryJo Sminkey 
> > wrote:
> >
> > > Okay so big thanks for the help with getting the hon_lucene_synonyms
> > plugin
> > > working. That is a big load off to finally have a solution in place for
> > all
> > > our multi-term synonyms. We did find that the information in Step 8
> about
> > > the plugin showing "SynonymExpandingExtendedDismaxQParser" for QParser
> > does
> > > not seem to be correct, we only ever get "ExtendedDismaxQParser" but
> the
> > > synonym expansion is definitely working.
> > >
> > > In implementing it though, the one thing I'm still having an issue with
> > is
> > > trying to figure out how I can get results on the original term to
> appear
> > > first in our results and matches on the synonyms lower in the results.
> > The
> > > plugin includes settings for an originalboost and synonymboost, but
> that
> > > doesn't seem to be working along with all the other edismax boosts I'm
> > > doing. We search across a number of 

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-06 Thread MaryJo Sminkey
Oh thanks, yeah I did miss that one field which had a parent type with the
normal synonym filter. However, that's our product SKU field so really
doesn't even come into play. I verified that none of the other fields have
a synonym filter set and even removed the productumbertext just to make
sure it wasn't doing anything. I was still getting the same results, the
matches with "SBC" in the name are buried under the "small block" matches.
After thinking over the issue, I realized what the solution was, I just
needed to set the synonym.originalBoost high enough that it would be higher
than the boosts provided by the phrase boosting, which is clearly what was
letting "small block" jump ahead of "sbc". So I bumped that up to 100
leaving the synonymBoost at 1 and now I'm getting the results I'm looking
for.

Thanks for the help!

Mary Jo

Sent with MailTrack


On Mon, Jun 6, 2016 at 4:57 PM, Joe Lawson <
jlaw...@opensourceconnections.com> wrote:

> Mary Jo.
>
> It appears to be working correctly but you have a very complex query going
> on so it can be confusing. Assuming you are using the queryParser as
> provided in examples your query would look like "+sbc" when it enters the
> queryParser and would look like "+((sbc)^2.0 (sb)^0.5 (small block)^0.5)"
> when it came out and then it would enter the normal pipeline and everything
> would be processed as individual tokens.
>
> It appears that you have synonyms being processed at query time on the
> prodnumbertext field. For example when (sbc)^2.0 enters into the normal
> query stage then have all the qf, pf, ps and tie modifies added so the
> first one turns into something like
>
> "(body:sbc^0.5 | productinfo:sbc^1.0 | keywords:sbc^2.0 | prodname:sbc^10.0
> | prodnumbertext:sbc^20.0)^2.0"
>
> Then the query time synonym expansion on produnumbertext combined with a
> phrase and default mm being 100% (
>
> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser#TheDisMaxQueryParser-Themm(MinimumShouldMatch)Parameter
> )
> you end up with query being
>
> (((prodnumbertext:sbc prodnumbertext:sb prodnumbertext:small)
> prodnumbertext:block)~2)^20.0
>
> The ~2 comes from mm=100% and having the phrase "small block" as a synonym.
> This messes up your results as well as anything in prodnumbertext will have
> to match "sbc block" "sb block" or "small block" which of course is only
> going to match small block. Check out the section "Multi-work synonyms
> won't work as phrase queries" in
> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ for
> more info.
>
> Advice: make sure on the schema that none of the fields your are running
> queries against do any complex query operations, especially make sure they
> aren't doing additional synonym resolution against the same file.
>
> I think you are getting hit by the MM bug.  Try tuning it way down to
> something like 0.01% and see how the matches go.
>
>
>
> On Fri, Jun 3, 2016 at 2:21 PM, MaryJo Sminkey 
> wrote:
>
> > Okay so big thanks for the help with getting the hon_lucene_synonyms
> plugin
> > working. That is a big load off to finally have a solution in place for
> all
> > our multi-term synonyms. We did find that the information in Step 8 about
> > the plugin showing "SynonymExpandingExtendedDismaxQParser" for QParser
> does
> > not seem to be correct, we only ever get "ExtendedDismaxQParser" but the
> > synonym expansion is definitely working.
> >
> > In implementing it though, the one thing I'm still having an issue with
> is
> > trying to figure out how I can get results on the original term to appear
> > first in our results and matches on the synonyms lower in the results.
> The
> > plugin includes settings for an originalboost and synonymboost, but that
> > doesn't seem to be working along with all the other edismax boosts I'm
> > doing. We search across a number of fields, each with their own boost and
> > then do phrase searches with boosts as well. My params look like this:
> >
> > params["defType"] = 'synonym_edismax';
> > params["qf"] = 'body^0.5 productinfo^1.0 keywords^2.0 prodname^10.0
> > prodnumbertext^20.0';
> > params["pf"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> > params["pf2"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> > params["pf3"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> > params["ps"] = 1;
> > params["tie"] = 0.1;
> > params["synonyms"] = true;
> > params["synonyms.originalBoost"] = 2.0;
> > params["synonyms.synonymBoost"] = 0.5;
> >
> > And here's an example of what the plugin gives me for a search on "sbc"
> > which includes synonyms for "sb" and "small block" I don't really
> know
> > enough about this to figure out what exactly it's doing but since all of
> > the results I am getting first are ones with "small block" in the name,
> and
> > the ones with "sbc" in the prodname field which should be first are
> buried
> > about 

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-06 Thread Joe Lawson
>
> Advice: make sure on the schema that none of the fields your are running
> queries against do any complex query operations, especially make sure they
> aren't doing additional synonym resolution against the same file.
>

BTW. I'd do this first before messing with MM


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-06 Thread Joe Lawson
Mary Jo.

It appears to be working correctly but you have a very complex query going
on so it can be confusing. Assuming you are using the queryParser as
provided in examples your query would look like "+sbc" when it enters the
queryParser and would look like "+((sbc)^2.0 (sb)^0.5 (small block)^0.5)"
when it came out and then it would enter the normal pipeline and everything
would be processed as individual tokens.

It appears that you have synonyms being processed at query time on the
prodnumbertext field. For example when (sbc)^2.0 enters into the normal
query stage then have all the qf, pf, ps and tie modifies added so the
first one turns into something like

"(body:sbc^0.5 | productinfo:sbc^1.0 | keywords:sbc^2.0 | prodname:sbc^10.0
| prodnumbertext:sbc^20.0)^2.0"

Then the query time synonym expansion on produnumbertext combined with a
phrase and default mm being 100% (
https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser#TheDisMaxQueryParser-Themm(MinimumShouldMatch)Parameter)
you end up with query being

(((prodnumbertext:sbc prodnumbertext:sb prodnumbertext:small)
prodnumbertext:block)~2)^20.0

The ~2 comes from mm=100% and having the phrase "small block" as a synonym.
This messes up your results as well as anything in prodnumbertext will have
to match "sbc block" "sb block" or "small block" which of course is only
going to match small block. Check out the section "Multi-work synonyms
won't work as phrase queries" in
https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ for
more info.

Advice: make sure on the schema that none of the fields your are running
queries against do any complex query operations, especially make sure they
aren't doing additional synonym resolution against the same file.

I think you are getting hit by the MM bug.  Try tuning it way down to
something like 0.01% and see how the matches go.



On Fri, Jun 3, 2016 at 2:21 PM, MaryJo Sminkey  wrote:

> Okay so big thanks for the help with getting the hon_lucene_synonyms plugin
> working. That is a big load off to finally have a solution in place for all
> our multi-term synonyms. We did find that the information in Step 8 about
> the plugin showing "SynonymExpandingExtendedDismaxQParser" for QParser does
> not seem to be correct, we only ever get "ExtendedDismaxQParser" but the
> synonym expansion is definitely working.
>
> In implementing it though, the one thing I'm still having an issue with is
> trying to figure out how I can get results on the original term to appear
> first in our results and matches on the synonyms lower in the results. The
> plugin includes settings for an originalboost and synonymboost, but that
> doesn't seem to be working along with all the other edismax boosts I'm
> doing. We search across a number of fields, each with their own boost and
> then do phrase searches with boosts as well. My params look like this:
>
> params["defType"] = 'synonym_edismax';
> params["qf"] = 'body^0.5 productinfo^1.0 keywords^2.0 prodname^10.0
> prodnumbertext^20.0';
> params["pf"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["pf2"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["pf3"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["ps"] = 1;
> params["tie"] = 0.1;
> params["synonyms"] = true;
> params["synonyms.originalBoost"] = 2.0;
> params["synonyms.synonymBoost"] = 0.5;
>
> And here's an example of what the plugin gives me for a search on "sbc"
> which includes synonyms for "sb" and "small block" I don't really know
> enough about this to figure out what exactly it's doing but since all of
> the results I am getting first are ones with "small block" in the name, and
> the ones with "sbc" in the prodname field which should be first are buried
> about 1000 documents in, I know the originalboost and synonymboost aren't
> working with all this other stuff. Ideas how to fix this? With the normal
> synonym filter we just set up copies of the fields that could have synonyms
> to use with that filter applied and had a lower boost on those. Not sure
> how to make it work with this custom query parser though.
>
> +((prodname:sbc^10.0 | body:sbc^0.5 | productinfo:sbc | keywords:sbc^2.0 |
> (((prodnumbertext:sbc prodnumbertext:small prodnumbertext:sb)
> prodnumbertext:block)~2)^20.0)~0.1^2.0 (((+(prodname:sb^10.0 | body:sb^0.5
> | productinfo:sb | keywords:sb^2.0 | (((prodnumbertext:sb
> prodnumbertext:small prodnumbertext:sbc) prodnumbertext:block)~2)^20.0)~0.1
> ()))^0.5) (((+(((prodname:small^10.0 | body:small^0.5 | productinfo:small |
> keywords:small^2.0 | prodnumbertext:small^20.0)~0.1 (prodname:block^10.0 |
> body:block^0.5 | productinfo:block | keywords:block^2.0 |
> prodnumbertext:block^20.0)~0.1)~2) (productinfo:"small block"~1 |
> body:"small block"~1^5.0 | keywords:"small block"~1^10.0 | prodname:"small
> block"~1^50.0)~0.1 (productinfo:"small block"~1 | body:"small block"~1^5.0
> | keywords:"small block"~1^10.0 | 

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-05 Thread John Bickerstaff
Yes, query parameters/modifications mentioned in the readme.  Beyond those
I don't have useful advice at this point
On Jun 4, 2016 10:56 PM, "MaryJo Sminkey"  wrote:

> On Sat, Jun 4, 2016 at 11:47 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > MaryJo - I'm on vacation but can't resist... iirc there are some very
> > useful query modifications suggested in the readme on the github for the
> > plugin... can't access right now.
> >
>
>
> I'm assuming you mean the various query parameters. The only ones I see in
> there that would be of use for me are the ones I'm already using. As far as
> can tell from their description.
>
> MJ
>
>
> Sent with MailTrack
> <
> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
> >
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-04 Thread MaryJo Sminkey
On Sat, Jun 4, 2016 at 11:47 PM, John Bickerstaff 
wrote:

> MaryJo - I'm on vacation but can't resist... iirc there are some very
> useful query modifications suggested in the readme on the github for the
> plugin... can't access right now.
>


I'm assuming you mean the various query parameters. The only ones I see in
there that would be of use for me are the ones I'm already using. As far as
can tell from their description.

MJ


Sent with MailTrack



Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-04 Thread John Bickerstaff
MaryJo - I'm on vacation but can't resist... iirc there are some very
useful query modifications suggested in the readme on the github for the
plugin... can't access right now.

You may know about them already, but if it's been a while since you looked,
those may help...
On Jun 3, 2016 12:28 PM, "MaryJo Sminkey"  wrote:

On some additional tests, it looks like it's the phrase matching in
particular that is the issue, if I take that out I do seem to be getting
better results. I definitely don't want to get rid of those so need to find
a way to make them work together.



Sent with MailTrack
<
https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
>

On Fri, Jun 3, 2016 at 2:21 PM, MaryJo Sminkey  wrote:

> Okay so big thanks for the help with getting the hon_lucene_synonyms
> plugin working. That is a big load off to finally have a solution in place
> for all our multi-term synonyms. We did find that the information in Step
8
> about the plugin showing "SynonymExpandingExtendedDismaxQParser" for
> QParser does not seem to be correct, we only ever get
> "ExtendedDismaxQParser" but the synonym expansion is definitely working.
>
> In implementing it though, the one thing I'm still having an issue with is
> trying to figure out how I can get results on the original term to appear
> first in our results and matches on the synonyms lower in the results. The
> plugin includes settings for an originalboost and synonymboost, but that
> doesn't seem to be working along with all the other edismax boosts I'm
> doing. We search across a number of fields, each with their own boost and
> then do phrase searches with boosts as well. My params look like this:
>
> params["defType"] = 'synonym_edismax';
> params["qf"] = 'body^0.5 productinfo^1.0 keywords^2.0 prodname^10.0
> prodnumbertext^20.0';
> params["pf"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["pf2"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["pf3"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["ps"] = 1;
> params["tie"] = 0.1;
> params["synonyms"] = true;
> params["synonyms.originalBoost"] = 2.0;
> params["synonyms.synonymBoost"] = 0.5;
>
> And here's an example of what the plugin gives me for a search on "sbc"
> which includes synonyms for "sb" and "small block" I don't really know
> enough about this to figure out what exactly it's doing but since all of
> the results I am getting first are ones with "small block" in the name,
and
> the ones with "sbc" in the prodname field which should be first are buried
> about 1000 documents in, I know the originalboost and synonymboost aren't
> working with all this other stuff. Ideas how to fix this? With the normal
> synonym filter we just set up copies of the fields that could have
synonyms
> to use with that filter applied and had a lower boost on those. Not sure
> how to make it work with this custom query parser though.
>
> +((prodname:sbc^10.0 | body:sbc^0.5 | productinfo:sbc | keywords:sbc^2.0 |
> (((prodnumbertext:sbc prodnumbertext:small prodnumbertext:sb)
> prodnumbertext:block)~2)^20.0)~0.1^2.0 (((+(prodname:sb^10.0 | body:sb^0.5
> | productinfo:sb | keywords:sb^2.0 | (((prodnumbertext:sb
> prodnumbertext:small prodnumbertext:sbc)
prodnumbertext:block)~2)^20.0)~0.1
> ()))^0.5) (((+(((prodname:small^10.0 | body:small^0.5 | productinfo:small
|
> keywords:small^2.0 | prodnumbertext:small^20.0)~0.1 (prodname:block^10.0 |
> body:block^0.5 | productinfo:block | keywords:block^2.0 |
> prodnumbertext:block^20.0)~0.1)~2) (productinfo:"small block"~1 |
> body:"small block"~1^5.0 | keywords:"small block"~1^10.0 | prodname:"small
> block"~1^50.0)~0.1 (productinfo:"small block"~1 | body:"small block"~1^5.0
> | keywords:"small block"~1^10.0 | prodname:"small
> block"~1^50.0)~0.1))^0.5)) ()
>
>
> Mary Jo
>
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-03 Thread MaryJo Sminkey
On some additional tests, it looks like it's the phrase matching in
particular that is the issue, if I take that out I do seem to be getting
better results. I definitely don't want to get rid of those so need to find
a way to make them work together.



Sent with MailTrack


On Fri, Jun 3, 2016 at 2:21 PM, MaryJo Sminkey  wrote:

> Okay so big thanks for the help with getting the hon_lucene_synonyms
> plugin working. That is a big load off to finally have a solution in place
> for all our multi-term synonyms. We did find that the information in Step 8
> about the plugin showing "SynonymExpandingExtendedDismaxQParser" for
> QParser does not seem to be correct, we only ever get
> "ExtendedDismaxQParser" but the synonym expansion is definitely working.
>
> In implementing it though, the one thing I'm still having an issue with is
> trying to figure out how I can get results on the original term to appear
> first in our results and matches on the synonyms lower in the results. The
> plugin includes settings for an originalboost and synonymboost, but that
> doesn't seem to be working along with all the other edismax boosts I'm
> doing. We search across a number of fields, each with their own boost and
> then do phrase searches with boosts as well. My params look like this:
>
> params["defType"] = 'synonym_edismax';
> params["qf"] = 'body^0.5 productinfo^1.0 keywords^2.0 prodname^10.0
> prodnumbertext^20.0';
> params["pf"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["pf2"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["pf3"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["ps"] = 1;
> params["tie"] = 0.1;
> params["synonyms"] = true;
> params["synonyms.originalBoost"] = 2.0;
> params["synonyms.synonymBoost"] = 0.5;
>
> And here's an example of what the plugin gives me for a search on "sbc"
> which includes synonyms for "sb" and "small block" I don't really know
> enough about this to figure out what exactly it's doing but since all of
> the results I am getting first are ones with "small block" in the name, and
> the ones with "sbc" in the prodname field which should be first are buried
> about 1000 documents in, I know the originalboost and synonymboost aren't
> working with all this other stuff. Ideas how to fix this? With the normal
> synonym filter we just set up copies of the fields that could have synonyms
> to use with that filter applied and had a lower boost on those. Not sure
> how to make it work with this custom query parser though.
>
> +((prodname:sbc^10.0 | body:sbc^0.5 | productinfo:sbc | keywords:sbc^2.0 |
> (((prodnumbertext:sbc prodnumbertext:small prodnumbertext:sb)
> prodnumbertext:block)~2)^20.0)~0.1^2.0 (((+(prodname:sb^10.0 | body:sb^0.5
> | productinfo:sb | keywords:sb^2.0 | (((prodnumbertext:sb
> prodnumbertext:small prodnumbertext:sbc) prodnumbertext:block)~2)^20.0)~0.1
> ()))^0.5) (((+(((prodname:small^10.0 | body:small^0.5 | productinfo:small |
> keywords:small^2.0 | prodnumbertext:small^20.0)~0.1 (prodname:block^10.0 |
> body:block^0.5 | productinfo:block | keywords:block^2.0 |
> prodnumbertext:block^20.0)~0.1)~2) (productinfo:"small block"~1 |
> body:"small block"~1^5.0 | keywords:"small block"~1^10.0 | prodname:"small
> block"~1^50.0)~0.1 (productinfo:"small block"~1 | body:"small block"~1^5.0
> | keywords:"small block"~1^10.0 | prodname:"small
> block"~1^50.0)~0.1))^0.5)) ()
>
>
> Mary Jo
>
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-03 Thread MaryJo Sminkey
Okay so big thanks for the help with getting the hon_lucene_synonyms plugin
working. That is a big load off to finally have a solution in place for all
our multi-term synonyms. We did find that the information in Step 8 about
the plugin showing "SynonymExpandingExtendedDismaxQParser" for QParser does
not seem to be correct, we only ever get "ExtendedDismaxQParser" but the
synonym expansion is definitely working.

In implementing it though, the one thing I'm still having an issue with is
trying to figure out how I can get results on the original term to appear
first in our results and matches on the synonyms lower in the results. The
plugin includes settings for an originalboost and synonymboost, but that
doesn't seem to be working along with all the other edismax boosts I'm
doing. We search across a number of fields, each with their own boost and
then do phrase searches with boosts as well. My params look like this:

params["defType"] = 'synonym_edismax';
params["qf"] = 'body^0.5 productinfo^1.0 keywords^2.0 prodname^10.0
prodnumbertext^20.0';
params["pf"] = 'productinfo^1 body^5 keywords^10 prodname^50';
params["pf2"] = 'productinfo^1 body^5 keywords^10 prodname^50';
params["pf3"] = 'productinfo^1 body^5 keywords^10 prodname^50';
params["ps"] = 1;
params["tie"] = 0.1;
params["synonyms"] = true;
params["synonyms.originalBoost"] = 2.0;
params["synonyms.synonymBoost"] = 0.5;

And here's an example of what the plugin gives me for a search on "sbc"
which includes synonyms for "sb" and "small block" I don't really know
enough about this to figure out what exactly it's doing but since all of
the results I am getting first are ones with "small block" in the name, and
the ones with "sbc" in the prodname field which should be first are buried
about 1000 documents in, I know the originalboost and synonymboost aren't
working with all this other stuff. Ideas how to fix this? With the normal
synonym filter we just set up copies of the fields that could have synonyms
to use with that filter applied and had a lower boost on those. Not sure
how to make it work with this custom query parser though.

+((prodname:sbc^10.0 | body:sbc^0.5 | productinfo:sbc | keywords:sbc^2.0 |
(((prodnumbertext:sbc prodnumbertext:small prodnumbertext:sb)
prodnumbertext:block)~2)^20.0)~0.1^2.0 (((+(prodname:sb^10.0 | body:sb^0.5
| productinfo:sb | keywords:sb^2.0 | (((prodnumbertext:sb
prodnumbertext:small prodnumbertext:sbc) prodnumbertext:block)~2)^20.0)~0.1
()))^0.5) (((+(((prodname:small^10.0 | body:small^0.5 | productinfo:small |
keywords:small^2.0 | prodnumbertext:small^20.0)~0.1 (prodname:block^10.0 |
body:block^0.5 | productinfo:block | keywords:block^2.0 |
prodnumbertext:block^20.0)~0.1)~2) (productinfo:"small block"~1 |
body:"small block"~1^5.0 | keywords:"small block"~1^10.0 | prodname:"small
block"~1^50.0)~0.1 (productinfo:"small block"~1 | body:"small block"~1^5.0
| keywords:"small block"~1^10.0 | prodname:"small
block"~1^50.0)~0.1))^0.5)) ()


Mary Jo


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff
Yes, I get that, thanks.
On Jun 1, 2016 6:38 PM, "Joe Lawson" 
wrote:

> 2.0 is compiled with Solr 5 and Java 7. It uses the namespace
> solr.SynonymExpandingExtendedDismaxQParserPlugin
>
> 5.0.4 is compiled with Solr 6 and Java 8 and is the first release that made
> it to maven central. It uses the namespace
> com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin
>
> The features are the same for all versions.
>
> Hope this clears things up.
>
> -Joe
> On Jun 1, 2016 8:11 PM, "John Bickerstaff" 
> wrote:
>
> > Just to be clear, I got version 2.0 of the jar from github...  should I
> be
> > look for something in a maven repository?  A bit confused at this point
> > given all the version numbers...
> >
> > I want the latest and greatest unless there's any special
> considerations..
> >
> > Thanks for the assistance!
> > On Jun 1, 2016 5:46 PM, "MaryJo Sminkey"  wrote:
> >
> > Yup that was the issue for us as well. It doesn't seem to be throwing the
> > class error now, although I have not been able to successfully get back
> > results that seem to be using it, it's showing up as the deftype in my
> > params but the QParser in my debug is the normal edismax one. I will have
> > to play around with my config some more tomorrow and try to figure out
> what
> > we're doing wrong.
> >
> > MJ
> >
> >
> >
> > On Wed, Jun 1, 2016 at 6:38 PM, Joe Lawson <
> > jlaw...@opensourceconnections.com> wrote:
> >
> > > Nothing up until 5.0.4 was distributed on maven central. 5.0 -> 5.0.4
> was
> > > just a bunch of clean up to get it ready for maven (including the
> > namespace
> > > change).
> > >
> > > Being that nearly all docs and articles talking about the plugin
> > reference
> > > the old 2.0 one could reasonably get confused as to what config to use
> > esp
> > > when I linked the latest 5.0.4 test config prior.
> > >
> > > You can get the older jars from the links off the readme.md.
> > > On Jun 1, 2016 6:14 PM, "Shawn Heisey"  wrote:
> > >
> > > On 6/1/2016 1:10 PM, John Bickerstaff wrote:
> > > > @Joe:
> > > >
> > > > Is it possible that the jar's package name does not match the entry
> in
> > > the
> > > > sample solrconfig.xml file?
> > > >
> > > > The solrconfig.xml example file in the test directory contains the
> > > > following package name:
> > > >  > > >
> > >
> > >
> >
> >
> class="com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin">
> > > >
> > > > However, the jar file (when unzipped) has the following directory
> > > structure
> > > > down to the same class name:
> > > >
> > > > org --> apache --> solr --> search
> > > >
> > > > I just tried with the name change to the org.apache package name
> in
> > > the
> > > > solrconfig.xml file and got no errors.
> > >
> > > Looks like the package name is indeed the problem here.
> > >
> > > They changed the package name from org.apache.solr.search to
> > > com.github.healthonnet.search in the LATEST source code release --
> > > 5.0.4.  The code in the 5.0.3 version (and the 2.0.0 version indicated
> > > in the earlier message) uses org.apache.solr.search.
> > >
> > > I cannot find any files in the 2.0.0 zipfile download that contain the
> > > new package name, so I'm curious where the incorrect information on how
> > > to configure Solr to use the plugin was found.  I did not check the
> > > tarball download.
> > >
> > > Thanks,
> > > Shawn
> > >
> >
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread Joe Lawson
2.0 is compiled with Solr 5 and Java 7. It uses the namespace
solr.SynonymExpandingExtendedDismaxQParserPlugin

5.0.4 is compiled with Solr 6 and Java 8 and is the first release that made
it to maven central. It uses the namespace
com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin

The features are the same for all versions.

Hope this clears things up.

-Joe
On Jun 1, 2016 8:11 PM, "John Bickerstaff"  wrote:

> Just to be clear, I got version 2.0 of the jar from github...  should I be
> look for something in a maven repository?  A bit confused at this point
> given all the version numbers...
>
> I want the latest and greatest unless there's any special considerations..
>
> Thanks for the assistance!
> On Jun 1, 2016 5:46 PM, "MaryJo Sminkey"  wrote:
>
> Yup that was the issue for us as well. It doesn't seem to be throwing the
> class error now, although I have not been able to successfully get back
> results that seem to be using it, it's showing up as the deftype in my
> params but the QParser in my debug is the normal edismax one. I will have
> to play around with my config some more tomorrow and try to figure out what
> we're doing wrong.
>
> MJ
>
>
>
> On Wed, Jun 1, 2016 at 6:38 PM, Joe Lawson <
> jlaw...@opensourceconnections.com> wrote:
>
> > Nothing up until 5.0.4 was distributed on maven central. 5.0 -> 5.0.4 was
> > just a bunch of clean up to get it ready for maven (including the
> namespace
> > change).
> >
> > Being that nearly all docs and articles talking about the plugin
> reference
> > the old 2.0 one could reasonably get confused as to what config to use
> esp
> > when I linked the latest 5.0.4 test config prior.
> >
> > You can get the older jars from the links off the readme.md.
> > On Jun 1, 2016 6:14 PM, "Shawn Heisey"  wrote:
> >
> > On 6/1/2016 1:10 PM, John Bickerstaff wrote:
> > > @Joe:
> > >
> > > Is it possible that the jar's package name does not match the entry in
> > the
> > > sample solrconfig.xml file?
> > >
> > > The solrconfig.xml example file in the test directory contains the
> > > following package name:
> > >  > >
> >
> >
>
> class="com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin">
> > >
> > > However, the jar file (when unzipped) has the following directory
> > structure
> > > down to the same class name:
> > >
> > > org --> apache --> solr --> search
> > >
> > > I just tried with the name change to the org.apache package name in
> > the
> > > solrconfig.xml file and got no errors.
> >
> > Looks like the package name is indeed the problem here.
> >
> > They changed the package name from org.apache.solr.search to
> > com.github.healthonnet.search in the LATEST source code release --
> > 5.0.4.  The code in the 5.0.3 version (and the 2.0.0 version indicated
> > in the earlier message) uses org.apache.solr.search.
> >
> > I cannot find any files in the 2.0.0 zipfile download that contain the
> > new package name, so I'm curious where the incorrect information on how
> > to configure Solr to use the plugin was found.  I did not check the
> > tarball download.
> >
> > Thanks,
> > Shawn
> >
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff
Just to be clear, I got version 2.0 of the jar from github...  should I be
look for something in a maven repository?  A bit confused at this point
given all the version numbers...

I want the latest and greatest unless there's any special considerations..

Thanks for the assistance!
On Jun 1, 2016 5:46 PM, "MaryJo Sminkey"  wrote:

Yup that was the issue for us as well. It doesn't seem to be throwing the
class error now, although I have not been able to successfully get back
results that seem to be using it, it's showing up as the deftype in my
params but the QParser in my debug is the normal edismax one. I will have
to play around with my config some more tomorrow and try to figure out what
we're doing wrong.

MJ



On Wed, Jun 1, 2016 at 6:38 PM, Joe Lawson <
jlaw...@opensourceconnections.com> wrote:

> Nothing up until 5.0.4 was distributed on maven central. 5.0 -> 5.0.4 was
> just a bunch of clean up to get it ready for maven (including the
namespace
> change).
>
> Being that nearly all docs and articles talking about the plugin reference
> the old 2.0 one could reasonably get confused as to what config to use esp
> when I linked the latest 5.0.4 test config prior.
>
> You can get the older jars from the links off the readme.md.
> On Jun 1, 2016 6:14 PM, "Shawn Heisey"  wrote:
>
> On 6/1/2016 1:10 PM, John Bickerstaff wrote:
> > @Joe:
> >
> > Is it possible that the jar's package name does not match the entry in
> the
> > sample solrconfig.xml file?
> >
> > The solrconfig.xml example file in the test directory contains the
> > following package name:
> >  >
>
>
class="com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin">
> >
> > However, the jar file (when unzipped) has the following directory
> structure
> > down to the same class name:
> >
> > org --> apache --> solr --> search
> >
> > I just tried with the name change to the org.apache package name in
> the
> > solrconfig.xml file and got no errors.
>
> Looks like the package name is indeed the problem here.
>
> They changed the package name from org.apache.solr.search to
> com.github.healthonnet.search in the LATEST source code release --
> 5.0.4.  The code in the 5.0.3 version (and the 2.0.0 version indicated
> in the earlier message) uses org.apache.solr.search.
>
> I cannot find any files in the 2.0.0 zipfile download that contain the
> new package name, so I'm curious where the incorrect information on how
> to configure Solr to use the plugin was found.  I did not check the
> tarball download.
>
> Thanks,
> Shawn
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread MaryJo Sminkey
Yup that was the issue for us as well. It doesn't seem to be throwing the
class error now, although I have not been able to successfully get back
results that seem to be using it, it's showing up as the deftype in my
params but the QParser in my debug is the normal edismax one. I will have
to play around with my config some more tomorrow and try to figure out what
we're doing wrong.

MJ



On Wed, Jun 1, 2016 at 6:38 PM, Joe Lawson <
jlaw...@opensourceconnections.com> wrote:

> Nothing up until 5.0.4 was distributed on maven central. 5.0 -> 5.0.4 was
> just a bunch of clean up to get it ready for maven (including the namespace
> change).
>
> Being that nearly all docs and articles talking about the plugin reference
> the old 2.0 one could reasonably get confused as to what config to use esp
> when I linked the latest 5.0.4 test config prior.
>
> You can get the older jars from the links off the readme.md.
> On Jun 1, 2016 6:14 PM, "Shawn Heisey"  wrote:
>
> On 6/1/2016 1:10 PM, John Bickerstaff wrote:
> > @Joe:
> >
> > Is it possible that the jar's package name does not match the entry in
> the
> > sample solrconfig.xml file?
> >
> > The solrconfig.xml example file in the test directory contains the
> > following package name:
> >  >
>
> class="com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin">
> >
> > However, the jar file (when unzipped) has the following directory
> structure
> > down to the same class name:
> >
> > org --> apache --> solr --> search
> >
> > I just tried with the name change to the org.apache package name in
> the
> > solrconfig.xml file and got no errors.
>
> Looks like the package name is indeed the problem here.
>
> They changed the package name from org.apache.solr.search to
> com.github.healthonnet.search in the LATEST source code release --
> 5.0.4.  The code in the 5.0.3 version (and the 2.0.0 version indicated
> in the earlier message) uses org.apache.solr.search.
>
> I cannot find any files in the 2.0.0 zipfile download that contain the
> new package name, so I'm curious where the incorrect information on how
> to configure Solr to use the plugin was found.  I did not check the
> tarball download.
>
> Thanks,
> Shawn
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread Joe Lawson
Nothing up until 5.0.4 was distributed on maven central. 5.0 -> 5.0.4 was
just a bunch of clean up to get it ready for maven (including the namespace
change).

Being that nearly all docs and articles talking about the plugin reference
the old 2.0 one could reasonably get confused as to what config to use esp
when I linked the latest 5.0.4 test config prior.

You can get the older jars from the links off the readme.md.
On Jun 1, 2016 6:14 PM, "Shawn Heisey"  wrote:

On 6/1/2016 1:10 PM, John Bickerstaff wrote:
> @Joe:
>
> Is it possible that the jar's package name does not match the entry in the
> sample solrconfig.xml file?
>
> The solrconfig.xml example file in the test directory contains the
> following package name:
> 
class="com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin">
>
> However, the jar file (when unzipped) has the following directory
structure
> down to the same class name:
>
> org --> apache --> solr --> search
>
> I just tried with the name change to the org.apache package name in
the
> solrconfig.xml file and got no errors.

Looks like the package name is indeed the problem here.

They changed the package name from org.apache.solr.search to
com.github.healthonnet.search in the LATEST source code release --
5.0.4.  The code in the 5.0.3 version (and the 2.0.0 version indicated
in the earlier message) uses org.apache.solr.search.

I cannot find any files in the 2.0.0 zipfile download that contain the
new package name, so I'm curious where the incorrect information on how
to configure Solr to use the plugin was found.  I did not check the
tarball download.

Thanks,
Shawn


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread Shawn Heisey
On 6/1/2016 1:10 PM, John Bickerstaff wrote:
> @Joe:
>
> Is it possible that the jar's package name does not match the entry in the
> sample solrconfig.xml file?
>
> The solrconfig.xml example file in the test directory contains the
> following package name:
>  class="com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin">
>
> However, the jar file (when unzipped) has the following directory structure
> down to the same class name:
>
> org --> apache --> solr --> search
>
> I just tried with the name change to the org.apache package name in the
> solrconfig.xml file and got no errors.

Looks like the package name is indeed the problem here.

They changed the package name from org.apache.solr.search to
com.github.healthonnet.search in the LATEST source code release --
5.0.4.  The code in the 5.0.3 version (and the 2.0.0 version indicated
in the earlier message) uses org.apache.solr.search.

I cannot find any files in the 2.0.0 zipfile download that contain the
new package name, so I'm curious where the incorrect information on how
to configure Solr to use the plugin was found.  I did not check the
tarball download.

Thanks,
Shawn



Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread Joe Lawson
I mean the 5.0 namespace is different from the 2.0 not 3.0.
On Jun 1, 2016 5:43 PM, "Joe Lawson" 
wrote:

2.0 is different from 3.0 so check the test config that is associated with
the 2.0 release. Ie


https://github.com/healthonnet/hon-lucene-synonyms/blob/8f736da053510911517fcb8a712b1d8ca5c920d2/src/test/resources/solr/collection1/conf/example_solrconfig.xml


On Jun 1, 2016 3:10 PM, "John Bickerstaff"  wrote:

> @Joe:
>
> Is it possible that the jar's package name does not match the entry in the
> sample solrconfig.xml file?
>
> The solrconfig.xml example file in the test directory contains the
> following package name:
> 
> class="com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin">
>
> However, the jar file (when unzipped) has the following directory structure
> down to the same class name:
>
> org --> apache --> solr --> search
>
> I just tried with the name change to the org.apache package name in the
> solrconfig.xml file and got no errors.
>
> I haven't yet tried to see synonym "stuff" in the debug for a query, but
> I'm betting it's much ado about nothing - just the package name has
> changed...
>
> If that makes sense to you, you may want to edit the example file...
>
> Thanks a lot for all the work you contributed to this by the way!
>
> --JohnB
>
> @ MaryJo - this may be the problem in your situation for this specific file
> -- good luck!
>
> I put it in $SOLR_HOME/lib  - which, taking the default "for production"
> install script on Ubuntu resolved to /var/solr/data/lib
>
> Good luck!
>
> On Wed, Jun 1, 2016 at 12:49 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > I tried this - it didn't fail.  I don't know if it really started in
> > Denable.runtime.lib=true mode or not:
> >
> > service solr start -Denable.runtime.lib=true
> >
> > Of course, I'd still really rather be able to just drop jars into
> > /var/solr/data/lib and have them work...
> >
> > Thanks all.
> >
> > On Wed, Jun 1, 2016 at 12:42 PM, John Bickerstaff <
> > j...@johnbickerstaff.com> wrote:
> >
> >> So - the instructions on using the Blob Store API say to use the
> >> Denable.runtime.lib=true option when starting Solr.
> >>
> >> Thing is, I've installed per the "for production" instructions which
> >> gives me an entry in /etc/init.d called solr.
> >>
> >> Two questions.
> >>
> >> To test this can I still use the start.jar in /opt/solr/server as long
> as
> >> I issue the "cloud mode" flag or does that no longer work in 5.x?
> >>
> >> Do I instead have to modify that start script in /etc/init.d ?
> >>
> >> On Wed, Jun 1, 2016 at 10:42 AM, John Bickerstaff <
> >> j...@johnbickerstaff.com> wrote:
> >>
> >>> Ahhh - gotcha.
> >>>
> >>> Well, not sure why it's not picked up - seems lots of other jars are...
> >>> Maybe Joe will comment...
> >>>
> >>> On Wed, Jun 1, 2016 at 10:22 AM, MaryJo Sminkey 
> >>> wrote:
> >>>
>  That refers to running Solr in cloud mode. We aren't there yet.
> 
>  MJ
> 
> 
> 
>  On Wed, Jun 1, 2016 at 12:20 PM, John Bickerstaff <
>  j...@johnbickerstaff.com>
>  wrote:
> 
>  > Hi Mary Jo,
>  >
>  > I'll point you to Joe's earlier comment about needing to use the
> Blob
>  Store
>  > API...  He put a link in his response.
>  >
>  > I'm about to try that today...  Given that Joe is a contributor to
>  > hon_lucene there's a good chance his experience is correct here
> -
>  > especially given the evidence you just provided...
>  >
>  > Here's a copy - paste for your convenience.  It's a bit convoluted,
>  > although I totally get how this kind of approach is great for large
>  Solr
>  > Cloud installations that have machines or VMs coming up and going
>  down as
>  > part of a services-based approach...
>  >
>  > Joe said:
>  > The docs are out of date for the synonym_edismax but it does work.
>  Check
>  > out the tests for working examples. I'll try to update it soon. I've
>  run
>  > the plugin on Solr 5 and 6, solrcloud and standalone. For running in
>  > SolrCloud make sure you follow
>  >
>  >
> 
> https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode
>  >
>  > On Wed, Jun 1, 2016 at 10:15 AM, MaryJo Sminkey <
> mjsmin...@gmail.com>
>  > wrote:
>  >
>  > > So we still can't get this to work, here's the latest update my
>  server
>  > guy
>  > > gave me: It seems to not matter where the file is located, it does
>  not
>  > > load. Yet, the the Solr Java class path shows the file has loaded.
>  Only
>  > > this path (./server/lib/hon-lucene-synonyms-2.0.0.jar) will work
> in
>  that
>  > it
>  > > loads in the java class path.  I've yet to find out what the error
>  is.
>  > All
>  > > I can see is this "Error loading class". 

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread Joe Lawson
2.0 is different from 3.0 so check the test config that is associated with
the 2.0 release. Ie


https://github.com/healthonnet/hon-lucene-synonyms/blob/8f736da053510911517fcb8a712b1d8ca5c920d2/src/test/resources/solr/collection1/conf/example_solrconfig.xml


On Jun 1, 2016 3:10 PM, "John Bickerstaff"  wrote:

> @Joe:
>
> Is it possible that the jar's package name does not match the entry in the
> sample solrconfig.xml file?
>
> The solrconfig.xml example file in the test directory contains the
> following package name:
> 
> class="com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin">
>
> However, the jar file (when unzipped) has the following directory structure
> down to the same class name:
>
> org --> apache --> solr --> search
>
> I just tried with the name change to the org.apache package name in the
> solrconfig.xml file and got no errors.
>
> I haven't yet tried to see synonym "stuff" in the debug for a query, but
> I'm betting it's much ado about nothing - just the package name has
> changed...
>
> If that makes sense to you, you may want to edit the example file...
>
> Thanks a lot for all the work you contributed to this by the way!
>
> --JohnB
>
> @ MaryJo - this may be the problem in your situation for this specific file
> -- good luck!
>
> I put it in $SOLR_HOME/lib  - which, taking the default "for production"
> install script on Ubuntu resolved to /var/solr/data/lib
>
> Good luck!
>
> On Wed, Jun 1, 2016 at 12:49 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > I tried this - it didn't fail.  I don't know if it really started in
> > Denable.runtime.lib=true mode or not:
> >
> > service solr start -Denable.runtime.lib=true
> >
> > Of course, I'd still really rather be able to just drop jars into
> > /var/solr/data/lib and have them work...
> >
> > Thanks all.
> >
> > On Wed, Jun 1, 2016 at 12:42 PM, John Bickerstaff <
> > j...@johnbickerstaff.com> wrote:
> >
> >> So - the instructions on using the Blob Store API say to use the
> >> Denable.runtime.lib=true option when starting Solr.
> >>
> >> Thing is, I've installed per the "for production" instructions which
> >> gives me an entry in /etc/init.d called solr.
> >>
> >> Two questions.
> >>
> >> To test this can I still use the start.jar in /opt/solr/server as long
> as
> >> I issue the "cloud mode" flag or does that no longer work in 5.x?
> >>
> >> Do I instead have to modify that start script in /etc/init.d ?
> >>
> >> On Wed, Jun 1, 2016 at 10:42 AM, John Bickerstaff <
> >> j...@johnbickerstaff.com> wrote:
> >>
> >>> Ahhh - gotcha.
> >>>
> >>> Well, not sure why it's not picked up - seems lots of other jars are...
> >>> Maybe Joe will comment...
> >>>
> >>> On Wed, Jun 1, 2016 at 10:22 AM, MaryJo Sminkey 
> >>> wrote:
> >>>
>  That refers to running Solr in cloud mode. We aren't there yet.
> 
>  MJ
> 
> 
> 
>  On Wed, Jun 1, 2016 at 12:20 PM, John Bickerstaff <
>  j...@johnbickerstaff.com>
>  wrote:
> 
>  > Hi Mary Jo,
>  >
>  > I'll point you to Joe's earlier comment about needing to use the
> Blob
>  Store
>  > API...  He put a link in his response.
>  >
>  > I'm about to try that today...  Given that Joe is a contributor to
>  > hon_lucene there's a good chance his experience is correct here
> -
>  > especially given the evidence you just provided...
>  >
>  > Here's a copy - paste for your convenience.  It's a bit convoluted,
>  > although I totally get how this kind of approach is great for large
>  Solr
>  > Cloud installations that have machines or VMs coming up and going
>  down as
>  > part of a services-based approach...
>  >
>  > Joe said:
>  > The docs are out of date for the synonym_edismax but it does work.
>  Check
>  > out the tests for working examples. I'll try to update it soon. I've
>  run
>  > the plugin on Solr 5 and 6, solrcloud and standalone. For running in
>  > SolrCloud make sure you follow
>  >
>  >
> 
> https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode
>  >
>  > On Wed, Jun 1, 2016 at 10:15 AM, MaryJo Sminkey <
> mjsmin...@gmail.com>
>  > wrote:
>  >
>  > > So we still can't get this to work, here's the latest update my
>  server
>  > guy
>  > > gave me: It seems to not matter where the file is located, it does
>  not
>  > > load. Yet, the the Solr Java class path shows the file has loaded.
>  Only
>  > > this path (./server/lib/hon-lucene-synonyms-2.0.0.jar) will work
> in
>  that
>  > it
>  > > loads in the java class path.  I've yet to find out what the error
>  is.
>  > All
>  > > I can see is this "Error loading class". Okay, but why? What error
>  was
>  > > encountered in trying to load the class?  I can't find any of this
>  > > information. 

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff
@Joe:

Is it possible that the jar's package name does not match the entry in the
sample solrconfig.xml file?

The solrconfig.xml example file in the test directory contains the
following package name:


However, the jar file (when unzipped) has the following directory structure
down to the same class name:

org --> apache --> solr --> search

I just tried with the name change to the org.apache package name in the
solrconfig.xml file and got no errors.

I haven't yet tried to see synonym "stuff" in the debug for a query, but
I'm betting it's much ado about nothing - just the package name has
changed...

If that makes sense to you, you may want to edit the example file...

Thanks a lot for all the work you contributed to this by the way!

--JohnB

@ MaryJo - this may be the problem in your situation for this specific file
-- good luck!

I put it in $SOLR_HOME/lib  - which, taking the default "for production"
install script on Ubuntu resolved to /var/solr/data/lib

Good luck!

On Wed, Jun 1, 2016 at 12:49 PM, John Bickerstaff 
wrote:

> I tried this - it didn't fail.  I don't know if it really started in
> Denable.runtime.lib=true mode or not:
>
> service solr start -Denable.runtime.lib=true
>
> Of course, I'd still really rather be able to just drop jars into
> /var/solr/data/lib and have them work...
>
> Thanks all.
>
> On Wed, Jun 1, 2016 at 12:42 PM, John Bickerstaff <
> j...@johnbickerstaff.com> wrote:
>
>> So - the instructions on using the Blob Store API say to use the
>> Denable.runtime.lib=true option when starting Solr.
>>
>> Thing is, I've installed per the "for production" instructions which
>> gives me an entry in /etc/init.d called solr.
>>
>> Two questions.
>>
>> To test this can I still use the start.jar in /opt/solr/server as long as
>> I issue the "cloud mode" flag or does that no longer work in 5.x?
>>
>> Do I instead have to modify that start script in /etc/init.d ?
>>
>> On Wed, Jun 1, 2016 at 10:42 AM, John Bickerstaff <
>> j...@johnbickerstaff.com> wrote:
>>
>>> Ahhh - gotcha.
>>>
>>> Well, not sure why it's not picked up - seems lots of other jars are...
>>> Maybe Joe will comment...
>>>
>>> On Wed, Jun 1, 2016 at 10:22 AM, MaryJo Sminkey 
>>> wrote:
>>>
 That refers to running Solr in cloud mode. We aren't there yet.

 MJ



 On Wed, Jun 1, 2016 at 12:20 PM, John Bickerstaff <
 j...@johnbickerstaff.com>
 wrote:

 > Hi Mary Jo,
 >
 > I'll point you to Joe's earlier comment about needing to use the Blob
 Store
 > API...  He put a link in his response.
 >
 > I'm about to try that today...  Given that Joe is a contributor to
 > hon_lucene there's a good chance his experience is correct here -
 > especially given the evidence you just provided...
 >
 > Here's a copy - paste for your convenience.  It's a bit convoluted,
 > although I totally get how this kind of approach is great for large
 Solr
 > Cloud installations that have machines or VMs coming up and going
 down as
 > part of a services-based approach...
 >
 > Joe said:
 > The docs are out of date for the synonym_edismax but it does work.
 Check
 > out the tests for working examples. I'll try to update it soon. I've
 run
 > the plugin on Solr 5 and 6, solrcloud and standalone. For running in
 > SolrCloud make sure you follow
 >
 >
 https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode
 >
 > On Wed, Jun 1, 2016 at 10:15 AM, MaryJo Sminkey 
 > wrote:
 >
 > > So we still can't get this to work, here's the latest update my
 server
 > guy
 > > gave me: It seems to not matter where the file is located, it does
 not
 > > load. Yet, the the Solr Java class path shows the file has loaded.
 Only
 > > this path (./server/lib/hon-lucene-synonyms-2.0.0.jar) will work in
 that
 > it
 > > loads in the java class path.  I've yet to find out what the error
 is.
 > All
 > > I can see is this "Error loading class". Okay, but why? What error
 was
 > > encountered in trying to load the class?  I can't find any of this
 > > information. I'm trying to work with the documentation that is
 located
 > here
 > > http://wiki.apache.org/solr/SolrPlugins
 > >
 > > I found that the jar file was put into each of these locations in an
 > > attempt to find a place where it will load without error.
 > >
 > > find .|grep hon-lucene
 > >
 > > ./server/lib/hon-lucene-synonyms-2.0.0.jar
 > >
 > > ./server/solr/plugins/hon-lucene-synonyms-2.0.0.jar
 > >
 > > ./server/solr/classic_newdb/lib/hon-lucene-synonyms-2.0.0.jar
 > >
 > > ./server/solr/classic_search/lib/hon-lucene-synonyms-2.0.0.jar
 > >
 > >
 

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff
I tried this - it didn't fail.  I don't know if it really started in
Denable.runtime.lib=true mode or not:

service solr start -Denable.runtime.lib=true

Of course, I'd still really rather be able to just drop jars into
/var/solr/data/lib and have them work...

Thanks all.

On Wed, Jun 1, 2016 at 12:42 PM, John Bickerstaff 
wrote:

> So - the instructions on using the Blob Store API say to use the
> Denable.runtime.lib=true option when starting Solr.
>
> Thing is, I've installed per the "for production" instructions which gives
> me an entry in /etc/init.d called solr.
>
> Two questions.
>
> To test this can I still use the start.jar in /opt/solr/server as long as
> I issue the "cloud mode" flag or does that no longer work in 5.x?
>
> Do I instead have to modify that start script in /etc/init.d ?
>
> On Wed, Jun 1, 2016 at 10:42 AM, John Bickerstaff <
> j...@johnbickerstaff.com> wrote:
>
>> Ahhh - gotcha.
>>
>> Well, not sure why it's not picked up - seems lots of other jars are...
>> Maybe Joe will comment...
>>
>> On Wed, Jun 1, 2016 at 10:22 AM, MaryJo Sminkey 
>> wrote:
>>
>>> That refers to running Solr in cloud mode. We aren't there yet.
>>>
>>> MJ
>>>
>>>
>>>
>>> On Wed, Jun 1, 2016 at 12:20 PM, John Bickerstaff <
>>> j...@johnbickerstaff.com>
>>> wrote:
>>>
>>> > Hi Mary Jo,
>>> >
>>> > I'll point you to Joe's earlier comment about needing to use the Blob
>>> Store
>>> > API...  He put a link in his response.
>>> >
>>> > I'm about to try that today...  Given that Joe is a contributor to
>>> > hon_lucene there's a good chance his experience is correct here -
>>> > especially given the evidence you just provided...
>>> >
>>> > Here's a copy - paste for your convenience.  It's a bit convoluted,
>>> > although I totally get how this kind of approach is great for large
>>> Solr
>>> > Cloud installations that have machines or VMs coming up and going down
>>> as
>>> > part of a services-based approach...
>>> >
>>> > Joe said:
>>> > The docs are out of date for the synonym_edismax but it does work.
>>> Check
>>> > out the tests for working examples. I'll try to update it soon. I've
>>> run
>>> > the plugin on Solr 5 and 6, solrcloud and standalone. For running in
>>> > SolrCloud make sure you follow
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode
>>> >
>>> > On Wed, Jun 1, 2016 at 10:15 AM, MaryJo Sminkey 
>>> > wrote:
>>> >
>>> > > So we still can't get this to work, here's the latest update my
>>> server
>>> > guy
>>> > > gave me: It seems to not matter where the file is located, it does
>>> not
>>> > > load. Yet, the the Solr Java class path shows the file has loaded.
>>> Only
>>> > > this path (./server/lib/hon-lucene-synonyms-2.0.0.jar) will work in
>>> that
>>> > it
>>> > > loads in the java class path.  I've yet to find out what the error
>>> is.
>>> > All
>>> > > I can see is this "Error loading class". Okay, but why? What error
>>> was
>>> > > encountered in trying to load the class?  I can't find any of this
>>> > > information. I'm trying to work with the documentation that is
>>> located
>>> > here
>>> > > http://wiki.apache.org/solr/SolrPlugins
>>> > >
>>> > > I found that the jar file was put into each of these locations in an
>>> > > attempt to find a place where it will load without error.
>>> > >
>>> > > find .|grep hon-lucene
>>> > >
>>> > > ./server/lib/hon-lucene-synonyms-2.0.0.jar
>>> > >
>>> > > ./server/solr/plugins/hon-lucene-synonyms-2.0.0.jar
>>> > >
>>> > > ./server/solr/classic_newdb/lib/hon-lucene-synonyms-2.0.0.jar
>>> > >
>>> > > ./server/solr/classic_search/lib/hon-lucene-synonyms-2.0.0.jar
>>> > >
>>> > > ./server/solr-webapp/webapp/WEB-INF/lib/hon-lucene-synonyms-2.0.0.jar
>>> > >
>>> > >  The config specifies that files in certain paths can be loaded as
>>> > plugins
>>> > > or I can specify a path. Following the instructions I added this path
>>> > >
>>> > >   >> > > dir="${solr.install.dir:../../../..}/contrib/hon-lucene-synonyms/lib"
>>> > > regex=".*\.jar" />
>>> > >
>>> > > And I put the jar file in that location.  This did not work either. I
>>> > also
>>> > > tried using an absolute path like this.
>>> > >
>>> > > >> > >
>>> > >
>>> >
>>> dir="/opt/solr/contrib/hon-lucene-synonyms/lib/hon-lucene-synonyms-2.0.0.jar"
>>> > > />
>>> > >
>>> > > This did not work.
>>> > >
>>> > >
>>> > >
>>> > > I'm starting to think this isn't a configuration problem, but a
>>> > > compatibility problem. I have not seen anything from the maker of
>>> this
>>> > > plugin that it works on the exact version of Solr we are using.
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > The best info I have found so far in the logs is this stack trace of
>>> the
>>> > > error. It still does not say why it failed to load.
>>> > >
>>> > > 2016-06-01 00:22:13.470 ERROR (qtp2096057945-14) [   ]
>>> > o.a.s.s.HttpSolrCall
>>> > > null:org.apache.solr.common.SolrException: 

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff
 >> >> > > time.
> >> >> > > >
> >> >> > > > There isn't any explicit listing of "text_autophrase" as the
> >> default
> >> >> > > search
> >> >> > > > field in the /autophrase search handler
> >> >> > > >
> >> >> > > > There isn't any explicit statement of "df=text_autophrase" in
> the
> >> >> query
> >> >> > > > statment: [/autophrase?q=New+York]
> >> >> > > >
> >> >> > > > Therefore it seems to me that if someone tries to implement
> this,
> >> >> > they're
> >> >> > > > going to be disappointed in the results unless they:
> >> >> > > > a. copy or otherwise get ALL the text they're interested in --
> >> into
> >> >> the
> >> >> > > > "text_autophrase" field as part of the schema.xml setup (to
> >> happen at
> >> >> > > index
> >> >> > > > time)
> >> >> > > > b. somehow explicitly declare "text_autophrase" as the default
> >> search
> >> >> > > field
> >> >> > > > - either in the searchHandler or wherever else the default
> field
> >> is
> >> >> > > > configured.
> >> >> > > >
> >> >> > > > If anyone out there has done this specific approach - could you
> >> >> > validate
> >> >> > > > whether my thought process is correct and / or if I'm missing
> >> >> > something?
> >> >> > > > Yes - I get that I can set it all up and try - but it's what I
> >> don't
> >> >> > > know I
> >> >> > > > don't know that bothers me...
> >> >> > > >
> >> >> > > > On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <
> >> >> > > > j...@johnbickerstaff.com
> >> >> > > > > wrote:
> >> >> > > >
> >> >> > > > > Thank you Steve -- very helpful.
> >> >> > > > >
> >> >> > > > > I can see that whatever implementation I decide to try, some
> >> >> testing
> >> >> > > will
> >> >> > > > > be in order.  If anyone is aware of significant gotchas with
> >> this
> >> >> > > synonym
> >> >> > > > > thing that are not mentioned in the already-listed URLs,
> please
> >> >> feel
> >> >> > > free
> >> >> > > > > to comment.
> >> >> > > > >
> >> >> > > > > On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <
> sar...@gmail.com>
> >> >> > wrote:
> >> >> > > > >
> >> >> > > > >> I’m working on addressing problems using multi-term
> synonyms at
> >> >> > query
> >> >> > > > >> time in Lucene and Solr.
> >> >> > > > >>
> >> >> > > > >> I recommend these two blogs for understanding the issues
> (the
> >> >> second
> >> >> > > one
> >> >> > > > >> was mentioned earlier in this thread):
> >> >> > > > >>
> >> >> > > > >> <
> >> >> > > > >>
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >>
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> >> >> > > > >> >
> >> >> > > > >> <
> >> >> >
> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/>
> >> >> > > > >>
> >> >> > > > >> In addition to the already-mentioned projects, there is
> also:
> >> >> > > > >>
> >> >> > > > >> <https://issues.apache.org/jira/browse/SOLR-5379>
> >> >> > > > >>
> >> >> > > > >> All of these projects try in various ways to work around the
> >> fact
> >> >> > that
> >> >>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread Jeff Wartes
>> >> > >
>> >> >
>> >>
>> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>> >> > > Error loading class
>> >> > >
>> >> >
>> >>
>> 'com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin'
>> >> > >
>> >> > > I have tried the auto-phrasing one as well (I did set up a field
>> using
>> >> > copy
>> >> > > to configure it on) but when testing it didn't seem to return the
>> >> > synonyms
>> >> > > as expected. So gave up on that one too (am willing to give it
>> another
>> >> > try
>> >> > > though, that was awhile ago). Would definitely like to hear what
>> other
>> >> > > people have found works on the latest versions of Solr 5.x and/or 6.
>> >> Just
>> >> > > sucks that this issue has never been fixed in the core product such
>> >> that
>> >> > > you still need to mess with plugins and patches to get such a basic
>> >> > > functionality working properly.
>> >> > >
>> >> > >
>> >> > > *Mary Jo Sminkey*
>> >> > > *Senior ColdFusion Developer*
>> >> > >
>> >> > > *CF Webtools*
>> >> > > You Dream It... We Build It. <https://www.cfwebtools.com/>
>> >> > > 11204 Davenport Suite 100
>> >> > > Omaha, Nebraska 68154
>> >> > > O: 402.408.3733 x128
>> >> > > E:  maryjo.smin...@cfwebtools.com
>> >> > > Skype: maryjos.cfwebtools
>> >> > >
>> >> > >
>> >> > > On Mon, May 30, 2016 at 5:02 PM, John Bickerstaff <
>> >> > > j...@johnbickerstaff.com>
>> >> > > wrote:
>> >> > >
>> >> > > > So I'm looking at the solution mentioned here:
>> >> > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>> >> > > >
>> >> > > > The thing that's troubling me slightly is that the way it's
>> >> documented
>> >> > it
>> >> > > > seems to be missing a small but important link...
>> >> > > >
>> >> > > > What exactly causes the results listed to be returned?
>> >> > > >
>> >> > > > Here's my thought process:
>> >> > > >
>> >> > > > 1. The entry for /autophrase searchHandler does not specify a
>> default
>> >> > > > search field.
>> >> > > > 2. The field type "text_autophrase" is set up as the one with the
>> >> > > > AutoPhrasingFilterFactory as part of it's indexing
>> >> > > >
>> >> > > > There isn't any mention (perhaps because it's too obvious) of the
>> >> need
>> >> > to
>> >> > > > copy or otherwise get data into the "text_autophrase" field at
>> index
>> >> > > time.
>> >> > > >
>> >> > > > There isn't any explicit listing of "text_autophrase" as the
>> default
>> >> > > search
>> >> > > > field in the /autophrase search handler
>> >> > > >
>> >> > > > There isn't any explicit statement of "df=text_autophrase" in the
>> >> query
>> >> > > > statment: [/autophrase?q=New+York]
>> >> > > >
>> >> > > > Therefore it seems to me that if someone tries to implement this,
>> >> > they're
>> >> > > > going to be disappointed in the results unless they:
>> >> > > > a. copy or otherwise get ALL the text they're interested in --
>> into
>> >> the
>> >> > > > "text_autophrase" field as part of the schema.xml setup (to
>> happen at
>> >> > > index
>> >> > > > time)
>> >> > > > b. somehow explicitly declare "text_autophrase" as the default
>> search
>> >> > > field
>> >> > > > - either in the searchHandler or wherever else the default fi

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff
So - the instructions on using the Blob Store API say to use the
Denable.runtime.lib=true option when starting Solr.

Thing is, I've installed per the "for production" instructions which gives
me an entry in /etc/init.d called solr.

Two questions.

To test this can I still use the start.jar in /opt/solr/server as long as I
issue the "cloud mode" flag or does that no longer work in 5.x?

Do I instead have to modify that start script in /etc/init.d ?

On Wed, Jun 1, 2016 at 10:42 AM, John Bickerstaff 
wrote:

> Ahhh - gotcha.
>
> Well, not sure why it's not picked up - seems lots of other jars are...
> Maybe Joe will comment...
>
> On Wed, Jun 1, 2016 at 10:22 AM, MaryJo Sminkey 
> wrote:
>
>> That refers to running Solr in cloud mode. We aren't there yet.
>>
>> MJ
>>
>>
>>
>> On Wed, Jun 1, 2016 at 12:20 PM, John Bickerstaff <
>> j...@johnbickerstaff.com>
>> wrote:
>>
>> > Hi Mary Jo,
>> >
>> > I'll point you to Joe's earlier comment about needing to use the Blob
>> Store
>> > API...  He put a link in his response.
>> >
>> > I'm about to try that today...  Given that Joe is a contributor to
>> > hon_lucene there's a good chance his experience is correct here -
>> > especially given the evidence you just provided...
>> >
>> > Here's a copy - paste for your convenience.  It's a bit convoluted,
>> > although I totally get how this kind of approach is great for large Solr
>> > Cloud installations that have machines or VMs coming up and going down
>> as
>> > part of a services-based approach...
>> >
>> > Joe said:
>> > The docs are out of date for the synonym_edismax but it does work. Check
>> > out the tests for working examples. I'll try to update it soon. I've run
>> > the plugin on Solr 5 and 6, solrcloud and standalone. For running in
>> > SolrCloud make sure you follow
>> >
>> >
>> https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode
>> >
>> > On Wed, Jun 1, 2016 at 10:15 AM, MaryJo Sminkey 
>> > wrote:
>> >
>> > > So we still can't get this to work, here's the latest update my server
>> > guy
>> > > gave me: It seems to not matter where the file is located, it does not
>> > > load. Yet, the the Solr Java class path shows the file has loaded.
>> Only
>> > > this path (./server/lib/hon-lucene-synonyms-2.0.0.jar) will work in
>> that
>> > it
>> > > loads in the java class path.  I've yet to find out what the error is.
>> > All
>> > > I can see is this "Error loading class". Okay, but why? What error was
>> > > encountered in trying to load the class?  I can't find any of this
>> > > information. I'm trying to work with the documentation that is located
>> > here
>> > > http://wiki.apache.org/solr/SolrPlugins
>> > >
>> > > I found that the jar file was put into each of these locations in an
>> > > attempt to find a place where it will load without error.
>> > >
>> > > find .|grep hon-lucene
>> > >
>> > > ./server/lib/hon-lucene-synonyms-2.0.0.jar
>> > >
>> > > ./server/solr/plugins/hon-lucene-synonyms-2.0.0.jar
>> > >
>> > > ./server/solr/classic_newdb/lib/hon-lucene-synonyms-2.0.0.jar
>> > >
>> > > ./server/solr/classic_search/lib/hon-lucene-synonyms-2.0.0.jar
>> > >
>> > > ./server/solr-webapp/webapp/WEB-INF/lib/hon-lucene-synonyms-2.0.0.jar
>> > >
>> > >  The config specifies that files in certain paths can be loaded as
>> > plugins
>> > > or I can specify a path. Following the instructions I added this path
>> > >
>> > >   > > > dir="${solr.install.dir:../../../..}/contrib/hon-lucene-synonyms/lib"
>> > > regex=".*\.jar" />
>> > >
>> > > And I put the jar file in that location.  This did not work either. I
>> > also
>> > > tried using an absolute path like this.
>> > >
>> > > > > >
>> > >
>> >
>> dir="/opt/solr/contrib/hon-lucene-synonyms/lib/hon-lucene-synonyms-2.0.0.jar"
>> > > />
>> > >
>> > > This did not work.
>> > >
>> > >
>> > >
>> > > I'm starting to think this isn't a configuration problem, but a
>> > > compatibility problem. I have not seen anything from the maker of this
>> > > plugin that it works on the exact version of Solr we are using.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > The best info I have found so far in the logs is this stack trace of
>> the
>> > > error. It still does not say why it failed to load.
>> > >
>> > > 2016-06-01 00:22:13.470 ERROR (qtp2096057945-14) [   ]
>> > o.a.s.s.HttpSolrCall
>> > > null:org.apache.solr.common.SolrException: SolrCore 'classic_search'
>> is
>> > not
>> > > available due to init failure: Error loading class
>> > > 'com.github.healthonnet.search.Syno
>> > >
>> > > nymExpandingExtendedDismaxQParserPlugin'
>> > >
>> > > at
>> > > org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:993)
>> > >
>> > > at
>> > org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:249)
>> > >
>> > > at
>> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:411)
>> > >
>> > > at
>> > >
>> > >
>> >
>> 

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff
Ahhh - gotcha.

Well, not sure why it's not picked up - seems lots of other jars are...
Maybe Joe will comment...

On Wed, Jun 1, 2016 at 10:22 AM, MaryJo Sminkey  wrote:

> That refers to running Solr in cloud mode. We aren't there yet.
>
> MJ
>
>
>
> On Wed, Jun 1, 2016 at 12:20 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > Hi Mary Jo,
> >
> > I'll point you to Joe's earlier comment about needing to use the Blob
> Store
> > API...  He put a link in his response.
> >
> > I'm about to try that today...  Given that Joe is a contributor to
> > hon_lucene there's a good chance his experience is correct here -
> > especially given the evidence you just provided...
> >
> > Here's a copy - paste for your convenience.  It's a bit convoluted,
> > although I totally get how this kind of approach is great for large Solr
> > Cloud installations that have machines or VMs coming up and going down as
> > part of a services-based approach...
> >
> > Joe said:
> > The docs are out of date for the synonym_edismax but it does work. Check
> > out the tests for working examples. I'll try to update it soon. I've run
> > the plugin on Solr 5 and 6, solrcloud and standalone. For running in
> > SolrCloud make sure you follow
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode
> >
> > On Wed, Jun 1, 2016 at 10:15 AM, MaryJo Sminkey 
> > wrote:
> >
> > > So we still can't get this to work, here's the latest update my server
> > guy
> > > gave me: It seems to not matter where the file is located, it does not
> > > load. Yet, the the Solr Java class path shows the file has loaded.
> Only
> > > this path (./server/lib/hon-lucene-synonyms-2.0.0.jar) will work in
> that
> > it
> > > loads in the java class path.  I've yet to find out what the error is.
> > All
> > > I can see is this "Error loading class". Okay, but why? What error was
> > > encountered in trying to load the class?  I can't find any of this
> > > information. I'm trying to work with the documentation that is located
> > here
> > > http://wiki.apache.org/solr/SolrPlugins
> > >
> > > I found that the jar file was put into each of these locations in an
> > > attempt to find a place where it will load without error.
> > >
> > > find .|grep hon-lucene
> > >
> > > ./server/lib/hon-lucene-synonyms-2.0.0.jar
> > >
> > > ./server/solr/plugins/hon-lucene-synonyms-2.0.0.jar
> > >
> > > ./server/solr/classic_newdb/lib/hon-lucene-synonyms-2.0.0.jar
> > >
> > > ./server/solr/classic_search/lib/hon-lucene-synonyms-2.0.0.jar
> > >
> > > ./server/solr-webapp/webapp/WEB-INF/lib/hon-lucene-synonyms-2.0.0.jar
> > >
> > >  The config specifies that files in certain paths can be loaded as
> > plugins
> > > or I can specify a path. Following the instructions I added this path
> > >
> > >> > dir="${solr.install.dir:../../../..}/contrib/hon-lucene-synonyms/lib"
> > > regex=".*\.jar" />
> > >
> > > And I put the jar file in that location.  This did not work either. I
> > also
> > > tried using an absolute path like this.
> > >
> > >  > >
> > >
> >
> dir="/opt/solr/contrib/hon-lucene-synonyms/lib/hon-lucene-synonyms-2.0.0.jar"
> > > />
> > >
> > > This did not work.
> > >
> > >
> > >
> > > I'm starting to think this isn't a configuration problem, but a
> > > compatibility problem. I have not seen anything from the maker of this
> > > plugin that it works on the exact version of Solr we are using.
> > >
> > >
> > >
> > >
> > >
> > > The best info I have found so far in the logs is this stack trace of
> the
> > > error. It still does not say why it failed to load.
> > >
> > > 2016-06-01 00:22:13.470 ERROR (qtp2096057945-14) [   ]
> > o.a.s.s.HttpSolrCall
> > > null:org.apache.solr.common.SolrException: SolrCore 'classic_search' is
> > not
> > > available due to init failure: Error loading class
> > > 'com.github.healthonnet.search.Syno
> > >
> > > nymExpandingExtendedDismaxQParserPlugin'
> > >
> > > at
> > > org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:993)
> > >
> > > at
> > org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:249)
> > >
> > > at
> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:411)
> > >
> > > at
> > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:222)
> > >
> > > at
> > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
> > >
> > > at
> > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> > >
> > > at
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> > >
> > > at
> > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> > >
> > > at
> > >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> > >
> > > 

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread MaryJo Sminkey
That refers to running Solr in cloud mode. We aren't there yet.

MJ



On Wed, Jun 1, 2016 at 12:20 PM, John Bickerstaff 
wrote:

> Hi Mary Jo,
>
> I'll point you to Joe's earlier comment about needing to use the Blob Store
> API...  He put a link in his response.
>
> I'm about to try that today...  Given that Joe is a contributor to
> hon_lucene there's a good chance his experience is correct here -
> especially given the evidence you just provided...
>
> Here's a copy - paste for your convenience.  It's a bit convoluted,
> although I totally get how this kind of approach is great for large Solr
> Cloud installations that have machines or VMs coming up and going down as
> part of a services-based approach...
>
> Joe said:
> The docs are out of date for the synonym_edismax but it does work. Check
> out the tests for working examples. I'll try to update it soon. I've run
> the plugin on Solr 5 and 6, solrcloud and standalone. For running in
> SolrCloud make sure you follow
>
> https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode
>
> On Wed, Jun 1, 2016 at 10:15 AM, MaryJo Sminkey 
> wrote:
>
> > So we still can't get this to work, here's the latest update my server
> guy
> > gave me: It seems to not matter where the file is located, it does not
> > load. Yet, the the Solr Java class path shows the file has loaded.  Only
> > this path (./server/lib/hon-lucene-synonyms-2.0.0.jar) will work in that
> it
> > loads in the java class path.  I've yet to find out what the error is.
> All
> > I can see is this "Error loading class". Okay, but why? What error was
> > encountered in trying to load the class?  I can't find any of this
> > information. I'm trying to work with the documentation that is located
> here
> > http://wiki.apache.org/solr/SolrPlugins
> >
> > I found that the jar file was put into each of these locations in an
> > attempt to find a place where it will load without error.
> >
> > find .|grep hon-lucene
> >
> > ./server/lib/hon-lucene-synonyms-2.0.0.jar
> >
> > ./server/solr/plugins/hon-lucene-synonyms-2.0.0.jar
> >
> > ./server/solr/classic_newdb/lib/hon-lucene-synonyms-2.0.0.jar
> >
> > ./server/solr/classic_search/lib/hon-lucene-synonyms-2.0.0.jar
> >
> > ./server/solr-webapp/webapp/WEB-INF/lib/hon-lucene-synonyms-2.0.0.jar
> >
> >  The config specifies that files in certain paths can be loaded as
> plugins
> > or I can specify a path. Following the instructions I added this path
> >
> >> dir="${solr.install.dir:../../../..}/contrib/hon-lucene-synonyms/lib"
> > regex=".*\.jar" />
> >
> > And I put the jar file in that location.  This did not work either. I
> also
> > tried using an absolute path like this.
> >
> >  >
> >
> dir="/opt/solr/contrib/hon-lucene-synonyms/lib/hon-lucene-synonyms-2.0.0.jar"
> > />
> >
> > This did not work.
> >
> >
> >
> > I'm starting to think this isn't a configuration problem, but a
> > compatibility problem. I have not seen anything from the maker of this
> > plugin that it works on the exact version of Solr we are using.
> >
> >
> >
> >
> >
> > The best info I have found so far in the logs is this stack trace of the
> > error. It still does not say why it failed to load.
> >
> > 2016-06-01 00:22:13.470 ERROR (qtp2096057945-14) [   ]
> o.a.s.s.HttpSolrCall
> > null:org.apache.solr.common.SolrException: SolrCore 'classic_search' is
> not
> > available due to init failure: Error loading class
> > 'com.github.healthonnet.search.Syno
> >
> > nymExpandingExtendedDismaxQParserPlugin'
> >
> > at
> > org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:993)
> >
> > at
> org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:249)
> >
> > at
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:411)
> >
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:222)
> >
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
> >
> > at
> >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> >
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> >
> > at
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> >
> > at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> >
> > at
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> >
> > at
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> >
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> >
> > at
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> >
> > at
> >
> >
> 

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff
Hi Mary Jo,

I'll point you to Joe's earlier comment about needing to use the Blob Store
API...  He put a link in his response.

I'm about to try that today...  Given that Joe is a contributor to
hon_lucene there's a good chance his experience is correct here -
especially given the evidence you just provided...

Here's a copy - paste for your convenience.  It's a bit convoluted,
although I totally get how this kind of approach is great for large Solr
Cloud installations that have machines or VMs coming up and going down as
part of a services-based approach...

Joe said:
The docs are out of date for the synonym_edismax but it does work. Check
out the tests for working examples. I'll try to update it soon. I've run
the plugin on Solr 5 and 6, solrcloud and standalone. For running in
SolrCloud make sure you follow
https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode

On Wed, Jun 1, 2016 at 10:15 AM, MaryJo Sminkey  wrote:

> So we still can't get this to work, here's the latest update my server guy
> gave me: It seems to not matter where the file is located, it does not
> load. Yet, the the Solr Java class path shows the file has loaded.  Only
> this path (./server/lib/hon-lucene-synonyms-2.0.0.jar) will work in that it
> loads in the java class path.  I've yet to find out what the error is. All
> I can see is this "Error loading class". Okay, but why? What error was
> encountered in trying to load the class?  I can't find any of this
> information. I'm trying to work with the documentation that is located here
> http://wiki.apache.org/solr/SolrPlugins
>
> I found that the jar file was put into each of these locations in an
> attempt to find a place where it will load without error.
>
> find .|grep hon-lucene
>
> ./server/lib/hon-lucene-synonyms-2.0.0.jar
>
> ./server/solr/plugins/hon-lucene-synonyms-2.0.0.jar
>
> ./server/solr/classic_newdb/lib/hon-lucene-synonyms-2.0.0.jar
>
> ./server/solr/classic_search/lib/hon-lucene-synonyms-2.0.0.jar
>
> ./server/solr-webapp/webapp/WEB-INF/lib/hon-lucene-synonyms-2.0.0.jar
>
>  The config specifies that files in certain paths can be loaded as plugins
> or I can specify a path. Following the instructions I added this path
>
>dir="${solr.install.dir:../../../..}/contrib/hon-lucene-synonyms/lib"
> regex=".*\.jar" />
>
> And I put the jar file in that location.  This did not work either. I also
> tried using an absolute path like this.
>
> 
> dir="/opt/solr/contrib/hon-lucene-synonyms/lib/hon-lucene-synonyms-2.0.0.jar"
> />
>
> This did not work.
>
>
>
> I'm starting to think this isn't a configuration problem, but a
> compatibility problem. I have not seen anything from the maker of this
> plugin that it works on the exact version of Solr we are using.
>
>
>
>
>
> The best info I have found so far in the logs is this stack trace of the
> error. It still does not say why it failed to load.
>
> 2016-06-01 00:22:13.470 ERROR (qtp2096057945-14) [   ] o.a.s.s.HttpSolrCall
> null:org.apache.solr.common.SolrException: SolrCore 'classic_search' is not
> available due to init failure: Error loading class
> 'com.github.healthonnet.search.Syno
>
> nymExpandingExtendedDismaxQParserPlugin'
>
> at
> org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:993)
>
> at org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:249)
>
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:411)
>
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:222)
>
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
>
> at
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>
> at
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>
> at
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>
> at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>
> at 

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread MaryJo Sminkey
So we still can't get this to work, here's the latest update my server guy
gave me: It seems to not matter where the file is located, it does not
load. Yet, the the Solr Java class path shows the file has loaded.  Only
this path (./server/lib/hon-lucene-synonyms-2.0.0.jar) will work in that it
loads in the java class path.  I've yet to find out what the error is. All
I can see is this "Error loading class". Okay, but why? What error was
encountered in trying to load the class?  I can't find any of this
information. I'm trying to work with the documentation that is located here
http://wiki.apache.org/solr/SolrPlugins

I found that the jar file was put into each of these locations in an
attempt to find a place where it will load without error.

find .|grep hon-lucene

./server/lib/hon-lucene-synonyms-2.0.0.jar

./server/solr/plugins/hon-lucene-synonyms-2.0.0.jar

./server/solr/classic_newdb/lib/hon-lucene-synonyms-2.0.0.jar

./server/solr/classic_search/lib/hon-lucene-synonyms-2.0.0.jar

./server/solr-webapp/webapp/WEB-INF/lib/hon-lucene-synonyms-2.0.0.jar

 The config specifies that files in certain paths can be loaded as plugins
or I can specify a path. Following the instructions I added this path

  

And I put the jar file in that location.  This did not work either. I also
tried using an absolute path like this.



This did not work.



I'm starting to think this isn't a configuration problem, but a
compatibility problem. I have not seen anything from the maker of this
plugin that it works on the exact version of Solr we are using.





The best info I have found so far in the logs is this stack trace of the
error. It still does not say why it failed to load.

2016-06-01 00:22:13.470 ERROR (qtp2096057945-14) [   ] o.a.s.s.HttpSolrCall
null:org.apache.solr.common.SolrException: SolrCore 'classic_search' is not
available due to init failure: Error loading class
'com.github.healthonnet.search.Syno

nymExpandingExtendedDismaxQParserPlugin'

at
org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:993)

at org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:249)

at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:411)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:222)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)

at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)

at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)

at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)

at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)

at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)

at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)

at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)

at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)

at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)

at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)

at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)

at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)

at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)

at org.eclipse.jetty.server.Server.handle(Server.java:499)

at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)

at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)

at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)

at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)

at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)

at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.solr.common.SolrException: Error loading class
'com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin'

at org.apache.solr.core.SolrCore.(SolrCore.java:824)

at org.apache.solr.core.SolrCore.(SolrCore.java:665)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:742)

at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:462)

at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:453)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:232)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread John Bickerstaff
Thanks Shawn

Yup - I created a /lib inside my $SOLR_HOME directory (which by default was
/var/solr/data)

I put the hon_lucene. jar file in there and rebooted - same errors
about class not found.

Tried again in what looked like the next most obvious spot
server/solr-webapp/webapp/WEB-INF/lib

Same result...  Class not found.

I'll go back and triple check

Joe - is that recommendation of using the Blob Store API an absolute?  I
know my IT guys are going to want to have the signing - it would be a lot
easier to just drop in jars we care about without worrying about the
signing.  Yes - I'm being lazy, I know. 

Thanks all!

On Tue, May 31, 2016 at 11:35 PM, Shawn Heisey  wrote:

> On 5/31/2016 3:13 PM, John Bickerstaff wrote:
> > The suggestion on the readme is that I can drop the
> > hon_lucene_synonyms jar file into the $SOLR_HOME directory, but this
> > does not seem to be working - I'm getting class not found exceptions.
>
> What I typically do with *all* extra jars (dataimport, mysql, ICU jars,
> etc) is put them into $SOLR_HOME/lib ... a directory that you will
> usually need to create.  If the installer script is used with default
> options, that directory will be /var/solr/data/lib.
>
> Any jar that you place in that directory will be loaded once at Solr
> startup and available to all cores.  The best thing about this directory
> is that it requires zero configuration.
>
> For 5.3 and later, loading jars into
> server/solr-webapp/webapp/WEB-INF/lib should also work, but then you are
> modifying the actual Solr install, which I normally avoid because it
> makes it a little bit harder to upgrade Solr.
>
> > Does anyone on this list have direct experience with getting this
> > plugin to work in Solr 5.x?
>
> I don't have any experience with that specific plugin, but I have
> successfully used other plugin jars with the lib directory mentioned above.
>
> Thanks,
> Shawn
>
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread Shawn Heisey
On 5/31/2016 3:13 PM, John Bickerstaff wrote:
> The suggestion on the readme is that I can drop the
> hon_lucene_synonyms jar file into the $SOLR_HOME directory, but this
> does not seem to be working - I'm getting class not found exceptions. 

What I typically do with *all* extra jars (dataimport, mysql, ICU jars,
etc) is put them into $SOLR_HOME/lib ... a directory that you will
usually need to create.  If the installer script is used with default
options, that directory will be /var/solr/data/lib.

Any jar that you place in that directory will be loaded once at Solr
startup and available to all cores.  The best thing about this directory
is that it requires zero configuration.

For 5.3 and later, loading jars into
server/solr-webapp/webapp/WEB-INF/lib should also work, but then you are
modifying the actual Solr install, which I normally avoid because it
makes it a little bit harder to upgrade Solr.

> Does anyone on this list have direct experience with getting this
> plugin to work in Solr 5.x? 

I don't have any experience with that specific plugin, but I have
successfully used other plugin jars with the lib directory mentioned above.

Thanks,
Shawn



Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread John Bickerstaff
e.solr.common.SolrException:
>> >> > > Error loading class
>> >> > >
>> >> >
>> >>
>> 'com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin'
>> >> > >
>> >> > > I have tried the auto-phrasing one as well (I did set up a field
>> using
>> >> > copy
>> >> > > to configure it on) but when testing it didn't seem to return the
>> >> > synonyms
>> >> > > as expected. So gave up on that one too (am willing to give it
>> another
>> >> > try
>> >> > > though, that was awhile ago). Would definitely like to hear what
>> other
>> >> > > people have found works on the latest versions of Solr 5.x and/or
>> 6.
>> >> Just
>> >> > > sucks that this issue has never been fixed in the core product such
>> >> that
>> >> > > you still need to mess with plugins and patches to get such a basic
>> >> > > functionality working properly.
>> >> > >
>> >> > >
>> >> > > *Mary Jo Sminkey*
>> >> > > *Senior ColdFusion Developer*
>> >> > >
>> >> > > *CF Webtools*
>> >> > > You Dream It... We Build It. <https://www.cfwebtools.com/>
>> >> > > 11204 Davenport Suite 100
>> >> > > Omaha, Nebraska 68154
>> >> > > O: 402.408.3733 x128
>> >> > > E:  maryjo.smin...@cfwebtools.com
>> >> > > Skype: maryjos.cfwebtools
>> >> > >
>> >> > >
>> >> > > On Mon, May 30, 2016 at 5:02 PM, John Bickerstaff <
>> >> > > j...@johnbickerstaff.com>
>> >> > > wrote:
>> >> > >
>> >> > > > So I'm looking at the solution mentioned here:
>> >> > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>> >> > > >
>> >> > > > The thing that's troubling me slightly is that the way it's
>> >> documented
>> >> > it
>> >> > > > seems to be missing a small but important link...
>> >> > > >
>> >> > > > What exactly causes the results listed to be returned?
>> >> > > >
>> >> > > > Here's my thought process:
>> >> > > >
>> >> > > > 1. The entry for /autophrase searchHandler does not specify a
>> default
>> >> > > > search field.
>> >> > > > 2. The field type "text_autophrase" is set up as the one with the
>> >> > > > AutoPhrasingFilterFactory as part of it's indexing
>> >> > > >
>> >> > > > There isn't any mention (perhaps because it's too obvious) of the
>> >> need
>> >> > to
>> >> > > > copy or otherwise get data into the "text_autophrase" field at
>> index
>> >> > > time.
>> >> > > >
>> >> > > > There isn't any explicit listing of "text_autophrase" as the
>> default
>> >> > > search
>> >> > > > field in the /autophrase search handler
>> >> > > >
>> >> > > > There isn't any explicit statement of "df=text_autophrase" in the
>> >> query
>> >> > > > statment: [/autophrase?q=New+York]
>> >> > > >
>> >> > > > Therefore it seems to me that if someone tries to implement this,
>> >> > they're
>> >> > > > going to be disappointed in the results unless they:
>> >> > > > a. copy or otherwise get ALL the text they're interested in --
>> into
>> >> the
>> >> > > > "text_autophrase" field as part of the schema.xml setup (to
>> happen at
>> >> > > index
>> >> > > > time)
>> >> > > > b. somehow explicitly declare "text_autophrase" as the default
>> search
>> >> > > field
>> >> > > > - either in the searchHandler or wherever else the default field
>> is
>> >> > > > configured.
>> >> > > >
>> >>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread John Bickerstaff
>> that
> >> > > you still need to mess with plugins and patches to get such a basic
> >> > > functionality working properly.
> >> > >
> >> > >
> >> > > *Mary Jo Sminkey*
> >> > > *Senior ColdFusion Developer*
> >> > >
> >> > > *CF Webtools*
> >> > > You Dream It... We Build It. <https://www.cfwebtools.com/>
> >> > > 11204 Davenport Suite 100
> >> > > Omaha, Nebraska 68154
> >> > > O: 402.408.3733 x128
> >> > > E:  maryjo.smin...@cfwebtools.com
> >> > > Skype: maryjos.cfwebtools
> >> > >
> >> > >
> >> > > On Mon, May 30, 2016 at 5:02 PM, John Bickerstaff <
> >> > > j...@johnbickerstaff.com>
> >> > > wrote:
> >> > >
> >> > > > So I'm looking at the solution mentioned here:
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> >> > > >
> >> > > > The thing that's troubling me slightly is that the way it's
> >> documented
> >> > it
> >> > > > seems to be missing a small but important link...
> >> > > >
> >> > > > What exactly causes the results listed to be returned?
> >> > > >
> >> > > > Here's my thought process:
> >> > > >
> >> > > > 1. The entry for /autophrase searchHandler does not specify a
> default
> >> > > > search field.
> >> > > > 2. The field type "text_autophrase" is set up as the one with the
> >> > > > AutoPhrasingFilterFactory as part of it's indexing
> >> > > >
> >> > > > There isn't any mention (perhaps because it's too obvious) of the
> >> need
> >> > to
> >> > > > copy or otherwise get data into the "text_autophrase" field at
> index
> >> > > time.
> >> > > >
> >> > > > There isn't any explicit listing of "text_autophrase" as the
> default
> >> > > search
> >> > > > field in the /autophrase search handler
> >> > > >
> >> > > > There isn't any explicit statement of "df=text_autophrase" in the
> >> query
> >> > > > statment: [/autophrase?q=New+York]
> >> > > >
> >> > > > Therefore it seems to me that if someone tries to implement this,
> >> > they're
> >> > > > going to be disappointed in the results unless they:
> >> > > > a. copy or otherwise get ALL the text they're interested in --
> into
> >> the
> >> > > > "text_autophrase" field as part of the schema.xml setup (to
> happen at
> >> > > index
> >> > > > time)
> >> > > > b. somehow explicitly declare "text_autophrase" as the default
> search
> >> > > field
> >> > > > - either in the searchHandler or wherever else the default field
> is
> >> > > > configured.
> >> > > >
> >> > > > If anyone out there has done this specific approach - could you
> >> > validate
> >> > > > whether my thought process is correct and / or if I'm missing
> >> > something?
> >> > > > Yes - I get that I can set it all up and try - but it's what I
> don't
> >> > > know I
> >> > > > don't know that bothers me...
> >> > > >
> >> > > > On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <
> >> > > > j...@johnbickerstaff.com
> >> > > > > wrote:
> >> > > >
> >> > > > > Thank you Steve -- very helpful.
> >> > > > >
> >> > > > > I can see that whatever implementation I decide to try, some
> >> testing
> >> > > will
> >> > > > > be in order.  If anyone is aware of significant gotchas with
> this
> >> > > synonym
> >> > > > > thing that are not mentioned in the already-listed URLs, please
> >> feel
> >> > > free
> >> > > > > to comment.
> >> > > > >
> >> > > > &g

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread Jeff Wartes
-using-the-auto-phrasing-tokenfilter/
>> > > >
>> > > > The thing that's troubling me slightly is that the way it's
>> documented
>> > it
>> > > > seems to be missing a small but important link...
>> > > >
>> > > > What exactly causes the results listed to be returned?
>> > > >
>> > > > Here's my thought process:
>> > > >
>> > > > 1. The entry for /autophrase searchHandler does not specify a default
>> > > > search field.
>> > > > 2. The field type "text_autophrase" is set up as the one with the
>> > > > AutoPhrasingFilterFactory as part of it's indexing
>> > > >
>> > > > There isn't any mention (perhaps because it's too obvious) of the
>> need
>> > to
>> > > > copy or otherwise get data into the "text_autophrase" field at index
>> > > time.
>> > > >
>> > > > There isn't any explicit listing of "text_autophrase" as the default
>> > > search
>> > > > field in the /autophrase search handler
>> > > >
>> > > > There isn't any explicit statement of "df=text_autophrase" in the
>> query
>> > > > statment: [/autophrase?q=New+York]
>> > > >
>> > > > Therefore it seems to me that if someone tries to implement this,
>> > they're
>> > > > going to be disappointed in the results unless they:
>> > > > a. copy or otherwise get ALL the text they're interested in -- into
>> the
>> > > > "text_autophrase" field as part of the schema.xml setup (to happen at
>> > > index
>> > > > time)
>> > > > b. somehow explicitly declare "text_autophrase" as the default search
>> > > field
>> > > > - either in the searchHandler or wherever else the default field is
>> > > > configured.
>> > > >
>> > > > If anyone out there has done this specific approach - could you
>> > validate
>> > > > whether my thought process is correct and / or if I'm missing
>> > something?
>> > > > Yes - I get that I can set it all up and try - but it's what I don't
>> > > know I
>> > > > don't know that bothers me...
>> > > >
>> > > > On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <
>> > > > j...@johnbickerstaff.com
>> > > > > wrote:
>> > > >
>> > > > > Thank you Steve -- very helpful.
>> > > > >
>> > > > > I can see that whatever implementation I decide to try, some
>> testing
>> > > will
>> > > > > be in order.  If anyone is aware of significant gotchas with this
>> > > synonym
>> > > > > thing that are not mentioned in the already-listed URLs, please
>> feel
>> > > free
>> > > > > to comment.
>> > > > >
>> > > > > On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <sar...@gmail.com>
>> > wrote:
>> > > > >
>> > > > >> I’m working on addressing problems using multi-term synonyms at
>> > query
>> > > > >> time in Lucene and Solr.
>> > > > >>
>> > > > >> I recommend these two blogs for understanding the issues (the
>> second
>> > > one
>> > > > >> was mentioned earlier in this thread):
>> > > > >>
>> > > > >> <
>> > > > >>
>> > > >
>> > >
>> >
>> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
>> > > > >> >
>> > > > >> <
>> > https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/>
>> > > > >>
>> > > > >> In addition to the already-mentioned projects, there is also:
>> > > > >>
>> > > > >> <https://issues.apache.org/jira/browse/SOLR-5379>
>> > > > >>
>> > > > >> All of these projects try in various ways to work around the fact
>> > that
>> > > > >> Lucene’s QueryParser splits on whitespace before sending text to
>> > > > analysis,
>> > > > >> one token at a time, so in a synonym filter, multi-word synonyms
>> can
>> > > > 

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread John Bickerstaff
btools*
> > > > > You Dream It... We Build It. <https://www.cfwebtools.com/>
> > > > > 11204 Davenport Suite 100
> > > > > Omaha, Nebraska 68154
> > > > > O: 402.408.3733 x128
> > > > > E:  maryjo.smin...@cfwebtools.com
> > > > > Skype: maryjos.cfwebtools
> > > > >
> > > > >
> > > > > On Mon, May 30, 2016 at 5:02 PM, John Bickerstaff <
> > > > > j...@johnbickerstaff.com>
> > > > > wrote:
> > > > >
> > > > > > So I'm looking at the solution mentioned here:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> > > > > >
> > > > > > The thing that's troubling me slightly is that the way it's
> > > documented
> > > > it
> > > > > > seems to be missing a small but important link...
> > > > > >
> > > > > > What exactly causes the results listed to be returned?
> > > > > >
> > > > > > Here's my thought process:
> > > > > >
> > > > > > 1. The entry for /autophrase searchHandler does not specify a
> > default
> > > > > > search field.
> > > > > > 2. The field type "text_autophrase" is set up as the one with the
> > > > > > AutoPhrasingFilterFactory as part of it's indexing
> > > > > >
> > > > > > There isn't any mention (perhaps because it's too obvious) of the
> > > need
> > > > to
> > > > > > copy or otherwise get data into the "text_autophrase" field at
> > index
> > > > > time.
> > > > > >
> > > > > > There isn't any explicit listing of "text_autophrase" as the
> > default
> > > > > search
> > > > > > field in the /autophrase search handler
> > > > > >
> > > > > > There isn't any explicit statement of "df=text_autophrase" in the
> > > query
> > > > > > statment: [/autophrase?q=New+York]
> > > > > >
> > > > > > Therefore it seems to me that if someone tries to implement this,
> > > > they're
> > > > > > going to be disappointed in the results unless they:
> > > > > > a. copy or otherwise get ALL the text they're interested in --
> into
> > > the
> > > > > > "text_autophrase" field as part of the schema.xml setup (to
> happen
> > at
> > > > > index
> > > > > > time)
> > > > > > b. somehow explicitly declare "text_autophrase" as the default
> > search
> > > > > field
> > > > > > - either in the searchHandler or wherever else the default field
> is
> > > > > > configured.
> > > > > >
> > > > > > If anyone out there has done this specific approach - could you
> > > > validate
> > > > > > whether my thought process is correct and / or if I'm missing
> > > > something?
> > > > > > Yes - I get that I can set it all up and try - but it's what I
> > don't
> > > > > know I
> > > > > > don't know that bothers me...
> > > > > >
> > > > > > On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <
> > > > > > j...@johnbickerstaff.com
> > > > > > > wrote:
> > > > > >
> > > > > > > Thank you Steve -- very helpful.
> > > > > > >
> > > > > > > I can see that whatever implementation I decide to try, some
> > > testing
> > > > > will
> > > > > > > be in order.  If anyone is aware of significant gotchas with
> this
> > > > > synonym
> > > > > > > thing that are not mentioned in the already-listed URLs, please
> > > feel
> > > > > free
> > > > > > > to comment.
> > > > > > >
> > > > > > > On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <sar...@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > >> I’m working on addressing problems using multi-term synonyms

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread Joe Lawson
014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> > > > >
> > > > > The thing that's troubling me slightly is that the way it's
> > documented
> > > it
> > > > > seems to be missing a small but important link...
> > > > >
> > > > > What exactly causes the results listed to be returned?
> > > > >
> > > > > Here's my thought process:
> > > > >
> > > > > 1. The entry for /autophrase searchHandler does not specify a
> default
> > > > > search field.
> > > > > 2. The field type "text_autophrase" is set up as the one with the
> > > > > AutoPhrasingFilterFactory as part of it's indexing
> > > > >
> > > > > There isn't any mention (perhaps because it's too obvious) of the
> > need
> > > to
> > > > > copy or otherwise get data into the "text_autophrase" field at
> index
> > > > time.
> > > > >
> > > > > There isn't any explicit listing of "text_autophrase" as the
> default
> > > > search
> > > > > field in the /autophrase search handler
> > > > >
> > > > > There isn't any explicit statement of "df=text_autophrase" in the
> > query
> > > > > statment: [/autophrase?q=New+York]
> > > > >
> > > > > Therefore it seems to me that if someone tries to implement this,
> > > they're
> > > > > going to be disappointed in the results unless they:
> > > > > a. copy or otherwise get ALL the text they're interested in -- into
> > the
> > > > > "text_autophrase" field as part of the schema.xml setup (to happen
> at
> > > > index
> > > > > time)
> > > > > b. somehow explicitly declare "text_autophrase" as the default
> search
> > > > field
> > > > > - either in the searchHandler or wherever else the default field is
> > > > > configured.
> > > > >
> > > > > If anyone out there has done this specific approach - could you
> > > validate
> > > > > whether my thought process is correct and / or if I'm missing
> > > something?
> > > > > Yes - I get that I can set it all up and try - but it's what I
> don't
> > > > know I
> > > > > don't know that bothers me...
> > > > >
> > > > > On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <
> > > > > j...@johnbickerstaff.com
> > > > > > wrote:
> > > > >
> > > > > > Thank you Steve -- very helpful.
> > > > > >
> > > > > > I can see that whatever implementation I decide to try, some
> > testing
> > > > will
> > > > > > be in order.  If anyone is aware of significant gotchas with this
> > > > synonym
> > > > > > thing that are not mentioned in the already-listed URLs, please
> > feel
> > > > free
> > > > > > to comment.
> > > > > >
> > > > > > On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <sar...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> I’m working on addressing problems using multi-term synonyms at
> > > query
> > > > > >> time in Lucene and Solr.
> > > > > >>
> > > > > >> I recommend these two blogs for understanding the issues (the
> > second
> > > > one
> > > > > >> was mentioned earlier in this thread):
> > > > > >>
> > > > > >> <
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> > > > > >> >
> > > > > >> <
> > > https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/>
> > > > > >>
> > > > > >> In addition to the already-mentioned projects, there is also:
> > > > > >>
> > > > > >> <https://issues.apache.org/jira/browse/SOLR-5379>
> > > > > >>
> > > > > >> All of these projects try in various ways to work around the
> fact
> > > that
> > > > > >> Lucene’s QueryParser splits on whitesp

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread John Bickerstaff
ex
> > > time.
> > > >
> > > > There isn't any explicit listing of "text_autophrase" as the default
> > > search
> > > > field in the /autophrase search handler
> > > >
> > > > There isn't any explicit statement of "df=text_autophrase" in the
> query
> > > > statment: [/autophrase?q=New+York]
> > > >
> > > > Therefore it seems to me that if someone tries to implement this,
> > they're
> > > > going to be disappointed in the results unless they:
> > > > a. copy or otherwise get ALL the text they're interested in -- into
> the
> > > > "text_autophrase" field as part of the schema.xml setup (to happen at
> > > index
> > > > time)
> > > > b. somehow explicitly declare "text_autophrase" as the default search
> > > field
> > > > - either in the searchHandler or wherever else the default field is
> > > > configured.
> > > >
> > > > If anyone out there has done this specific approach - could you
> > validate
> > > > whether my thought process is correct and / or if I'm missing
> > something?
> > > > Yes - I get that I can set it all up and try - but it's what I don't
> > > know I
> > > > don't know that bothers me...
> > > >
> > > > On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <
> > > > j...@johnbickerstaff.com
> > > > > wrote:
> > > >
> > > > > Thank you Steve -- very helpful.
> > > > >
> > > > > I can see that whatever implementation I decide to try, some
> testing
> > > will
> > > > > be in order.  If anyone is aware of significant gotchas with this
> > > synonym
> > > > > thing that are not mentioned in the already-listed URLs, please
> feel
> > > free
> > > > > to comment.
> > > > >
> > > > > On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <sar...@gmail.com>
> > wrote:
> > > > >
> > > > >> I’m working on addressing problems using multi-term synonyms at
> > query
> > > > >> time in Lucene and Solr.
> > > > >>
> > > > >> I recommend these two blogs for understanding the issues (the
> second
> > > one
> > > > >> was mentioned earlier in this thread):
> > > > >>
> > > > >> <
> > > > >>
> > > >
> > >
> >
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> > > > >> >
> > > > >> <
> > https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/>
> > > > >>
> > > > >> In addition to the already-mentioned projects, there is also:
> > > > >>
> > > > >> <https://issues.apache.org/jira/browse/SOLR-5379>
> > > > >>
> > > > >> All of these projects try in various ways to work around the fact
> > that
> > > > >> Lucene’s QueryParser splits on whitespace before sending text to
> > > > analysis,
> > > > >> one token at a time, so in a synonym filter, multi-word synonyms
> can
> > > > never
> > > > >> match and add alternatives.  See <
> > > > >> https://issues.apache.org/jira/browse/LUCENE-2605>, where I’ve
> > > posted a
> > > > >> patch to directly address that problem - note that it’s still a
> work
> > > in
> > > > >> progress.
> > > > >>
> > > > >> Once LUCENE-2605 has been fixed, there is still work to do getting
> > > > >> (e)dismax to work with the modified Lucene QueryParser, and
> > addressing
> > > > >> problems with how queries are constructed from Lucene’s
> “sausagized”
> > > > token
> > > > >> stream.
> > > > >>
> > > > >> --
> > > > >> Steve
> > > > >> www.lucidworks.com
> > > > >>
> > > > >> > On May 26, 2016, at 2:21 PM, John Bickerstaff <
> > > > j...@johnbickerstaff.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > Thanks Chris --
> > > > >> >
> > > > >> > The two projects I'm aware of are:
> > > > >> &g

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-30 Thread MaryJo Sminkey
get that I can set it all up and try - but it's what I don't
> > know I
> > > don't know that bothers me...
> > >
> > > On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <
> > > j...@johnbickerstaff.com
> > > > wrote:
> > >
> > > > Thank you Steve -- very helpful.
> > > >
> > > > I can see that whatever implementation I decide to try, some testing
> > will
> > > > be in order.  If anyone is aware of significant gotchas with this
> > synonym
> > > > thing that are not mentioned in the already-listed URLs, please feel
> > free
> > > > to comment.
> > > >
> > > > On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <sar...@gmail.com>
> wrote:
> > > >
> > > >> I’m working on addressing problems using multi-term synonyms at
> query
> > > >> time in Lucene and Solr.
> > > >>
> > > >> I recommend these two blogs for understanding the issues (the second
> > one
> > > >> was mentioned earlier in this thread):
> > > >>
> > > >> <
> > > >>
> > >
> >
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> > > >> >
> > > >> <
> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/>
> > > >>
> > > >> In addition to the already-mentioned projects, there is also:
> > > >>
> > > >> <https://issues.apache.org/jira/browse/SOLR-5379>
> > > >>
> > > >> All of these projects try in various ways to work around the fact
> that
> > > >> Lucene’s QueryParser splits on whitespace before sending text to
> > > analysis,
> > > >> one token at a time, so in a synonym filter, multi-word synonyms can
> > > never
> > > >> match and add alternatives.  See <
> > > >> https://issues.apache.org/jira/browse/LUCENE-2605>, where I’ve
> > posted a
> > > >> patch to directly address that problem - note that it’s still a work
> > in
> > > >> progress.
> > > >>
> > > >> Once LUCENE-2605 has been fixed, there is still work to do getting
> > > >> (e)dismax to work with the modified Lucene QueryParser, and
> addressing
> > > >> problems with how queries are constructed from Lucene’s “sausagized”
> > > token
> > > >> stream.
> > > >>
> > > >> --
> > > >> Steve
> > > >> www.lucidworks.com
> > > >>
> > > >> > On May 26, 2016, at 2:21 PM, John Bickerstaff <
> > > j...@johnbickerstaff.com>
> > > >> wrote:
> > > >> >
> > > >> > Thanks Chris --
> > > >> >
> > > >> > The two projects I'm aware of are:
> > > >> >
> > > >> > https://github.com/healthonnet/hon-lucene-synonyms
> > > >> >
> > > >> > and the one referenced from the Lucidworks page here:
> > > >> >
> > > >>
> > >
> >
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> > > >> >
> > > >> > ... which is here :
> > > >> https://github.com/LucidWorks/auto-phrase-tokenfilter
> > > >> >
> > > >> > Is there anything else out there that you would recommend I look
> at?
> > > >> >
> > > >> > On Thu, May 26, 2016 at 12:01 PM, Chris Morley <
> ch...@depahelix.com
> > >
> > > >> wrote:
> > > >> >
> > > >> >> Chris Morley here, from Wayfair.  (Depahelix = my domain)
> > > >> >>
> > > >> >> Suyash Sonawane and I have worked on multiple word synonyms at
> > > Wayfair.
> > > >> >> We worked mostly off of Ted Sullivan's work and also off of some
> > > >> >> suggestions from Koorosh Vakhshoori.  We have gotten to a point
> > where
> > > >> we
> > > >> >> have a more sophisticated internal implementation, however, we've
> > > found
> > > >> >> that it is very difficult to make it do what you want it to do,
> and
> > > >> also be
> > > >> >> sufficiently performant.  Watch out for exceptional situations
> w

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-30 Thread John Bickerstaff
Thanks for the comment Mary Jo...

The error loading the class rings a bell - did you find and follow
instructions for adding that to the WAR file?  I vaguely remember seeing
something about that.

I'm going to try my own tests on the auto phrasing one..  If I'm
successful, I'll post back.

On Mon, May 30, 2016 at 3:45 PM, MaryJo Sminkey <mjsmin...@gmail.com> wrote:

> This is a very timely discussion for me as well as we're trying to tackle
> the multi term synonym issue as well and have not been able to hon-lucene
> plugin to work, the jar shows up as installed but when we set up the sample
> request handler it throws this error:
>
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> Error loading class
> 'com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin'
>
> I have tried the auto-phrasing one as well (I did set up a field using copy
> to configure it on) but when testing it didn't seem to return the synonyms
> as expected. So gave up on that one too (am willing to give it another try
> though, that was awhile ago). Would definitely like to hear what other
> people have found works on the latest versions of Solr 5.x and/or 6. Just
> sucks that this issue has never been fixed in the core product such that
> you still need to mess with plugins and patches to get such a basic
> functionality working properly.
>
>
> *Mary Jo Sminkey*
> *Senior ColdFusion Developer*
>
> *CF Webtools*
> You Dream It... We Build It. <https://www.cfwebtools.com/>
> 11204 Davenport Suite 100
> Omaha, Nebraska 68154
> O: 402.408.3733 x128
> E:  maryjo.smin...@cfwebtools.com
> Skype: maryjos.cfwebtools
>
>
> On Mon, May 30, 2016 at 5:02 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > So I'm looking at the solution mentioned here:
> >
> >
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> >
> > The thing that's troubling me slightly is that the way it's documented it
> > seems to be missing a small but important link...
> >
> > What exactly causes the results listed to be returned?
> >
> > Here's my thought process:
> >
> > 1. The entry for /autophrase searchHandler does not specify a default
> > search field.
> > 2. The field type "text_autophrase" is set up as the one with the
> > AutoPhrasingFilterFactory as part of it's indexing
> >
> > There isn't any mention (perhaps because it's too obvious) of the need to
> > copy or otherwise get data into the "text_autophrase" field at index
> time.
> >
> > There isn't any explicit listing of "text_autophrase" as the default
> search
> > field in the /autophrase search handler
> >
> > There isn't any explicit statement of "df=text_autophrase" in the query
> > statment: [/autophrase?q=New+York]
> >
> > Therefore it seems to me that if someone tries to implement this, they're
> > going to be disappointed in the results unless they:
> > a. copy or otherwise get ALL the text they're interested in -- into the
> > "text_autophrase" field as part of the schema.xml setup (to happen at
> index
> > time)
> > b. somehow explicitly declare "text_autophrase" as the default search
> field
> > - either in the searchHandler or wherever else the default field is
> > configured.
> >
> > If anyone out there has done this specific approach - could you validate
> > whether my thought process is correct and / or if I'm missing something?
> > Yes - I get that I can set it all up and try - but it's what I don't
> know I
> > don't know that bothers me...
> >
> > On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <
> > j...@johnbickerstaff.com
> > > wrote:
> >
> > > Thank you Steve -- very helpful.
> > >
> > > I can see that whatever implementation I decide to try, some testing
> will
> > > be in order.  If anyone is aware of significant gotchas with this
> synonym
> > > thing that are not mentioned in the already-listed URLs, please feel
> free
> > > to comment.
> > >
> > > On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <sar...@gmail.com> wrote:
> > >
> > >> I’m working on addressing problems using multi-term synonyms at query
> > >> time in Lucene and Solr.
> > >>
> > >> I recommend these two blogs for understanding the issues (the second
> one
> > >> was mentioned earlier in this thread):
> > >>
> > >> <
> > >>
&g

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-30 Thread MaryJo Sminkey
This is a very timely discussion for me as well as we're trying to tackle
the multi term synonym issue as well and have not been able to hon-lucene
plugin to work, the jar shows up as installed but when we set up the sample
request handler it throws this error:

org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Error loading class
'com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin'

I have tried the auto-phrasing one as well (I did set up a field using copy
to configure it on) but when testing it didn't seem to return the synonyms
as expected. So gave up on that one too (am willing to give it another try
though, that was awhile ago). Would definitely like to hear what other
people have found works on the latest versions of Solr 5.x and/or 6. Just
sucks that this issue has never been fixed in the core product such that
you still need to mess with plugins and patches to get such a basic
functionality working properly.


*Mary Jo Sminkey*
*Senior ColdFusion Developer*

*CF Webtools*
You Dream It... We Build It. <https://www.cfwebtools.com/>
11204 Davenport Suite 100
Omaha, Nebraska 68154
O: 402.408.3733  x128
E:  maryjo.smin...@cfwebtools.com
Skype: maryjos.cfwebtools


On Mon, May 30, 2016 at 5:02 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> So I'm looking at the solution mentioned here:
>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>
> The thing that's troubling me slightly is that the way it's documented it
> seems to be missing a small but important link...
>
> What exactly causes the results listed to be returned?
>
> Here's my thought process:
>
> 1. The entry for /autophrase searchHandler does not specify a default
> search field.
> 2. The field type "text_autophrase" is set up as the one with the
> AutoPhrasingFilterFactory as part of it's indexing
>
> There isn't any mention (perhaps because it's too obvious) of the need to
> copy or otherwise get data into the "text_autophrase" field at index time.
>
> There isn't any explicit listing of "text_autophrase" as the default search
> field in the /autophrase search handler
>
> There isn't any explicit statement of "df=text_autophrase" in the query
> statment: [/autophrase?q=New+York]
>
> Therefore it seems to me that if someone tries to implement this, they're
> going to be disappointed in the results unless they:
> a. copy or otherwise get ALL the text they're interested in -- into the
> "text_autophrase" field as part of the schema.xml setup (to happen at index
> time)
> b. somehow explicitly declare "text_autophrase" as the default search field
> - either in the searchHandler or wherever else the default field is
> configured.
>
> If anyone out there has done this specific approach - could you validate
> whether my thought process is correct and / or if I'm missing something?
> Yes - I get that I can set it all up and try - but it's what I don't know I
> don't know that bothers me...
>
> On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <
> j...@johnbickerstaff.com
> > wrote:
>
> > Thank you Steve -- very helpful.
> >
> > I can see that whatever implementation I decide to try, some testing will
> > be in order.  If anyone is aware of significant gotchas with this synonym
> > thing that are not mentioned in the already-listed URLs, please feel free
> > to comment.
> >
> > On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <sar...@gmail.com> wrote:
> >
> >> I’m working on addressing problems using multi-term synonyms at query
> >> time in Lucene and Solr.
> >>
> >> I recommend these two blogs for understanding the issues (the second one
> >> was mentioned earlier in this thread):
> >>
> >> <
> >>
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> >> >
> >> <https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/>
> >>
> >> In addition to the already-mentioned projects, there is also:
> >>
> >> <https://issues.apache.org/jira/browse/SOLR-5379>
> >>
> >> All of these projects try in various ways to work around the fact that
> >> Lucene’s QueryParser splits on whitespace before sending text to
> analysis,
> >> one token at a time, so in a synonym filter, multi-word synonyms can
> never
> >> match and add alternatives.  See <
> >> https://issues.apache.org/jira/browse/LUCENE-2605>, where I’ve posted a
> >> patch to directly address that problem - note that it’s still a work in
> >> progre

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-30 Thread John Bickerstaff
So I'm looking at the solution mentioned here:
https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/

The thing that's troubling me slightly is that the way it's documented it
seems to be missing a small but important link...

What exactly causes the results listed to be returned?

Here's my thought process:

1. The entry for /autophrase searchHandler does not specify a default
search field.
2. The field type "text_autophrase" is set up as the one with the
AutoPhrasingFilterFactory as part of it's indexing

There isn't any mention (perhaps because it's too obvious) of the need to
copy or otherwise get data into the "text_autophrase" field at index time.

There isn't any explicit listing of "text_autophrase" as the default search
field in the /autophrase search handler

There isn't any explicit statement of "df=text_autophrase" in the query
statment: [/autophrase?q=New+York]

Therefore it seems to me that if someone tries to implement this, they're
going to be disappointed in the results unless they:
a. copy or otherwise get ALL the text they're interested in -- into the
"text_autophrase" field as part of the schema.xml setup (to happen at index
time)
b. somehow explicitly declare "text_autophrase" as the default search field
- either in the searchHandler or wherever else the default field is
configured.

If anyone out there has done this specific approach - could you validate
whether my thought process is correct and / or if I'm missing something?
Yes - I get that I can set it all up and try - but it's what I don't know I
don't know that bothers me...

On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <j...@johnbickerstaff.com
> wrote:

> Thank you Steve -- very helpful.
>
> I can see that whatever implementation I decide to try, some testing will
> be in order.  If anyone is aware of significant gotchas with this synonym
> thing that are not mentioned in the already-listed URLs, please feel free
> to comment.
>
> On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <sar...@gmail.com> wrote:
>
>> I’m working on addressing problems using multi-term synonyms at query
>> time in Lucene and Solr.
>>
>> I recommend these two blogs for understanding the issues (the second one
>> was mentioned earlier in this thread):
>>
>> <
>> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
>> >
>> <https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/>
>>
>> In addition to the already-mentioned projects, there is also:
>>
>> <https://issues.apache.org/jira/browse/SOLR-5379>
>>
>> All of these projects try in various ways to work around the fact that
>> Lucene’s QueryParser splits on whitespace before sending text to analysis,
>> one token at a time, so in a synonym filter, multi-word synonyms can never
>> match and add alternatives.  See <
>> https://issues.apache.org/jira/browse/LUCENE-2605>, where I’ve posted a
>> patch to directly address that problem - note that it’s still a work in
>> progress.
>>
>> Once LUCENE-2605 has been fixed, there is still work to do getting
>> (e)dismax to work with the modified Lucene QueryParser, and addressing
>> problems with how queries are constructed from Lucene’s “sausagized” token
>> stream.
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>> > On May 26, 2016, at 2:21 PM, John Bickerstaff <j...@johnbickerstaff.com>
>> wrote:
>> >
>> > Thanks Chris --
>> >
>> > The two projects I'm aware of are:
>> >
>> > https://github.com/healthonnet/hon-lucene-synonyms
>> >
>> > and the one referenced from the Lucidworks page here:
>> >
>> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>> >
>> > ... which is here :
>> https://github.com/LucidWorks/auto-phrase-tokenfilter
>> >
>> > Is there anything else out there that you would recommend I look at?
>> >
>> > On Thu, May 26, 2016 at 12:01 PM, Chris Morley <ch...@depahelix.com>
>> wrote:
>> >
>> >> Chris Morley here, from Wayfair.  (Depahelix = my domain)
>> >>
>> >> Suyash Sonawane and I have worked on multiple word synonyms at Wayfair.
>> >> We worked mostly off of Ted Sullivan's work and also off of some
>> >> suggestions from Koorosh Vakhshoori.  We have gotten to a point where
>> we
>> >> have a more sophisticated internal implementation, however, we've found
>> >> that it is very diff

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-27 Thread John Bickerstaff
Thank you Steve -- very helpful.

I can see that whatever implementation I decide to try, some testing will
be in order.  If anyone is aware of significant gotchas with this synonym
thing that are not mentioned in the already-listed URLs, please feel free
to comment.

On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <sar...@gmail.com> wrote:

> I’m working on addressing problems using multi-term synonyms at query time
> in Lucene and Solr.
>
> I recommend these two blogs for understanding the issues (the second one
> was mentioned earlier in this thread):
>
> <
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> >
> <https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/>
>
> In addition to the already-mentioned projects, there is also:
>
> <https://issues.apache.org/jira/browse/SOLR-5379>
>
> All of these projects try in various ways to work around the fact that
> Lucene’s QueryParser splits on whitespace before sending text to analysis,
> one token at a time, so in a synonym filter, multi-word synonyms can never
> match and add alternatives.  See <
> https://issues.apache.org/jira/browse/LUCENE-2605>, where I’ve posted a
> patch to directly address that problem - note that it’s still a work in
> progress.
>
> Once LUCENE-2605 has been fixed, there is still work to do getting
> (e)dismax to work with the modified Lucene QueryParser, and addressing
> problems with how queries are constructed from Lucene’s “sausagized” token
> stream.
>
> --
> Steve
> www.lucidworks.com
>
> > On May 26, 2016, at 2:21 PM, John Bickerstaff <j...@johnbickerstaff.com>
> wrote:
> >
> > Thanks Chris --
> >
> > The two projects I'm aware of are:
> >
> > https://github.com/healthonnet/hon-lucene-synonyms
> >
> > and the one referenced from the Lucidworks page here:
> >
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> >
> > ... which is here :
> https://github.com/LucidWorks/auto-phrase-tokenfilter
> >
> > Is there anything else out there that you would recommend I look at?
> >
> > On Thu, May 26, 2016 at 12:01 PM, Chris Morley <ch...@depahelix.com>
> wrote:
> >
> >> Chris Morley here, from Wayfair.  (Depahelix = my domain)
> >>
> >> Suyash Sonawane and I have worked on multiple word synonyms at Wayfair.
> >> We worked mostly off of Ted Sullivan's work and also off of some
> >> suggestions from Koorosh Vakhshoori.  We have gotten to a point where we
> >> have a more sophisticated internal implementation, however, we've found
> >> that it is very difficult to make it do what you want it to do, and
> also be
> >> sufficiently performant.  Watch out for exceptional situations with mm
> >> (minimum should match).
> >>
> >> Trey Grainger (now at Lucidworks) and Simon Hughes of Dice.com have also
> >> done work in this area.
> >>
> >> It should be very possible to get this kind of thing working on
> >> SolrCloud.  I haven't tried it yet but I think theoretically, it should
> >> just work.  The synonyms stuff is mostly about doing things at index
> time
> >> and query time.  The index time stuff should translate to SolrCloud
> >> directly, while the query time stuff might pose some issues, but
> probably
> >> not too bad, if there are any issues at all.
> >>
> >> I've had decent luck porting our various plugins from 4.10.x to 5.5.0
> >> because a lot of stuff is just Java, and it still works within the Jetty
> >> context.
> >>
> >> -Chris.
> >>
> >>
> >>
> >>
> >> 
> >> From: "John Bickerstaff" <j...@johnbickerstaff.com>
> >> Sent: Thursday, May 26, 2016 1:51 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax
> parser
> >> Hey Jeff (or anyone interested in multi-word synonyms) here are some
> >> potentially interesting links...
> >>
> >> http://wiki.apache.org/solr/QueryParser (search the page for
> >> synonum_edismax)
> >>
> >> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
> (blog
> >> post about what became the synonym_edissmax Query Parser)
> >>
> >>
> >>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> >>
> >> This last was us

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-27 Thread Steve Rowe
I’m working on addressing problems using multi-term synonyms at query time in 
Lucene and Solr.

I recommend these two blogs for understanding the issues (the second one was 
mentioned earlier in this thread):

<http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html>
<https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/>

In addition to the already-mentioned projects, there is also:

<https://issues.apache.org/jira/browse/SOLR-5379>

All of these projects try in various ways to work around the fact that Lucene’s 
QueryParser splits on whitespace before sending text to analysis, one token at 
a time, so in a synonym filter, multi-word synonyms can never match and add 
alternatives.  See <https://issues.apache.org/jira/browse/LUCENE-2605>, where 
I’ve posted a patch to directly address that problem - note that it’s still a 
work in progress.

Once LUCENE-2605 has been fixed, there is still work to do getting (e)dismax to 
work with the modified Lucene QueryParser, and addressing problems with how 
queries are constructed from Lucene’s “sausagized” token stream.

--
Steve
www.lucidworks.com

> On May 26, 2016, at 2:21 PM, John Bickerstaff <j...@johnbickerstaff.com> 
> wrote:
> 
> Thanks Chris --
> 
> The two projects I'm aware of are:
> 
> https://github.com/healthonnet/hon-lucene-synonyms
> 
> and the one referenced from the Lucidworks page here:
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> 
> ... which is here : https://github.com/LucidWorks/auto-phrase-tokenfilter
> 
> Is there anything else out there that you would recommend I look at?
> 
> On Thu, May 26, 2016 at 12:01 PM, Chris Morley <ch...@depahelix.com> wrote:
> 
>> Chris Morley here, from Wayfair.  (Depahelix = my domain)
>> 
>> Suyash Sonawane and I have worked on multiple word synonyms at Wayfair.
>> We worked mostly off of Ted Sullivan's work and also off of some
>> suggestions from Koorosh Vakhshoori.  We have gotten to a point where we
>> have a more sophisticated internal implementation, however, we've found
>> that it is very difficult to make it do what you want it to do, and also be
>> sufficiently performant.  Watch out for exceptional situations with mm
>> (minimum should match).
>> 
>> Trey Grainger (now at Lucidworks) and Simon Hughes of Dice.com have also
>> done work in this area.
>> 
>> It should be very possible to get this kind of thing working on
>> SolrCloud.  I haven't tried it yet but I think theoretically, it should
>> just work.  The synonyms stuff is mostly about doing things at index time
>> and query time.  The index time stuff should translate to SolrCloud
>> directly, while the query time stuff might pose some issues, but probably
>> not too bad, if there are any issues at all.
>> 
>> I've had decent luck porting our various plugins from 4.10.x to 5.5.0
>> because a lot of stuff is just Java, and it still works within the Jetty
>> context.
>> 
>> -Chris.
>> 
>> 
>> 
>> 
>> --------
>> From: "John Bickerstaff" <j...@johnbickerstaff.com>
>> Sent: Thursday, May 26, 2016 1:51 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser
>> Hey Jeff (or anyone interested in multi-word synonyms) here are some
>> potentially interesting links...
>> 
>> http://wiki.apache.org/solr/QueryParser (search the page for
>> synonum_edismax)
>> 
>> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ (blog
>> post about what became the synonym_edissmax Query Parser)
>> 
>> 
>> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>> 
>> This last was useful for lots of reasons and contains links to other
>> interesting, related web pages...
>> 
>> On Thu, May 26, 2016 at 11:45 AM, Jeff Wartes <jwar...@whitepages.com>
>> wrote:
>> 
>>> Oh, interesting. I've certainty encountered issues with multi-word
>>> synonyms, but I hadn't come across this. If you end up using it with a
>>> recent solr verison, I'd be glad to hear your experience.
>>> 
>>> I haven't used it, but I am aware of one other project in this vein that
>>> you might be interested in looking at:
>>> https://github.com/LucidWorks/auto-phrase-tokenfilter
>>> 
>>> 
>>> On 5/26/16, 9:29 AM, "John Bickerstaff" <j...@johnbickerstaff.com>
>> wrote:
>>> 
>>&g

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-26 Thread John Bickerstaff
Thanks Chris --

The two projects I'm aware of are:

https://github.com/healthonnet/hon-lucene-synonyms

and the one referenced from the Lucidworks page here:
https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/

... which is here : https://github.com/LucidWorks/auto-phrase-tokenfilter

Is there anything else out there that you would recommend I look at?

On Thu, May 26, 2016 at 12:01 PM, Chris Morley <ch...@depahelix.com> wrote:

> Chris Morley here, from Wayfair.  (Depahelix = my domain)
>
>  Suyash Sonawane and I have worked on multiple word synonyms at Wayfair.
> We worked mostly off of Ted Sullivan's work and also off of some
> suggestions from Koorosh Vakhshoori.  We have gotten to a point where we
> have a more sophisticated internal implementation, however, we've found
> that it is very difficult to make it do what you want it to do, and also be
> sufficiently performant.  Watch out for exceptional situations with mm
> (minimum should match).
>
>  Trey Grainger (now at Lucidworks) and Simon Hughes of Dice.com have also
> done work in this area.
>
>  It should be very possible to get this kind of thing working on
> SolrCloud.  I haven't tried it yet but I think theoretically, it should
> just work.  The synonyms stuff is mostly about doing things at index time
> and query time.  The index time stuff should translate to SolrCloud
> directly, while the query time stuff might pose some issues, but probably
> not too bad, if there are any issues at all.
>
>  I've had decent luck porting our various plugins from 4.10.x to 5.5.0
> because a lot of stuff is just Java, and it still works within the Jetty
> context.
>
>  -Chris.
>
>
>
>
> 
>  From: "John Bickerstaff" <j...@johnbickerstaff.com>
> Sent: Thursday, May 26, 2016 1:51 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser
> Hey Jeff (or anyone interested in multi-word synonyms) here are some
> potentially interesting links...
>
> http://wiki.apache.org/solr/QueryParser (search the page for
> synonum_edismax)
>
> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ (blog
> post about what became the synonym_edissmax Query Parser)
>
>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>
> This last was useful for lots of reasons and contains links to other
> interesting, related web pages...
>
> On Thu, May 26, 2016 at 11:45 AM, Jeff Wartes <jwar...@whitepages.com>
> wrote:
>
> > Oh, interesting. I've certainty encountered issues with multi-word
> > synonyms, but I hadn't come across this. If you end up using it with a
> > recent solr verison, I'd be glad to hear your experience.
> >
> > I haven't used it, but I am aware of one other project in this vein that
> > you might be interested in looking at:
> > https://github.com/LucidWorks/auto-phrase-tokenfilter
> >
> >
> > On 5/26/16, 9:29 AM, "John Bickerstaff" <j...@johnbickerstaff.com>
> wrote:
> >
> > >Ahh - for question #3 I may have spoken too soon. This line from the
> > >github repository readme suggests a way.
> > >
> > >Update: We have tested to run with the jar in $SOLR_HOME/lib as well,
> and
> > >it works (Jetty).
> > >
> > >I'll try that and only respond back if that doesn't work.
> > >
> > >Questions 1 and 2 still stand of course... If anyone on the list has
> > >experience in this area...
> > >
> > >Thanks.
> > >
> > >On Thu, May 26, 2016 at 10:25 AM, John Bickerstaff <
> > j...@johnbickerstaff.com
> > >> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I'm creating a Solr Cloud that will index and search medical text.
> > >> Multi-word synonyms are a pretty important factor.
> > >>
> > >> I find that there are some challenges around multi-word synonyms and I
> > >> also found on the wiki that there is a recommended 3rd-party parser
> > >> (synonym_edismax parser) created by Nolan Lawson and found here:
> > >> https://github.com/healthonnet/hon-lucene-synonyms
> > >>
> > >> Here's the thing - the instructions on the github site involve
> bringing
> > >> the jar file into the war file - which is not applicable any more...
> at
> > >> least I think it's not...
> > >>
> > >> I have three questions:
> > >>
> > >> 1. Is this still a good solution for multi-word synonyms (I.e. Solr
> > Cloud
> > >> doesn't break it in some way)
> > >> 2. Is there a tool or plug-in out there that the contributors would
> > >> recommend above this one?
> > >> 3. Assuming 1 = yes and 2 = no, can anyone tell me an updated
> procedure
> > >> for bringing it in to Solr Cloud (I'm running 5.4.x)
> > >>
> > >> Thanks
> > >>
> >
> >
>
>
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-26 Thread Chris Morley
Chris Morley here, from Wayfair.  (Depahelix = my domain)

 Suyash Sonawane and I have worked on multiple word synonyms at Wayfair.  We 
worked mostly off of Ted Sullivan's work and also off of some suggestions from 
Koorosh Vakhshoori.  We have gotten to a point where we have a more 
sophisticated internal implementation, however, we've found that it is very 
difficult to make it do what you want it to do, and also be sufficiently 
performant.  Watch out for exceptional situations with mm (minimum should 
match).

 Trey Grainger (now at Lucidworks) and Simon Hughes of Dice.com have also done 
work in this area.

 It should be very possible to get this kind of thing working on SolrCloud.  I 
haven't tried it yet but I think theoretically, it should just work.  The 
synonyms stuff is mostly about doing things at index time and query time.  The 
index time stuff should translate to SolrCloud directly, while the query time 
stuff might pose some issues, but probably not too bad, if there are any issues 
at all.

 I've had decent luck porting our various plugins from 4.10.x to 5.5.0 because 
a lot of stuff is just Java, and it still works within the Jetty context.

 -Chris.





 From: "John Bickerstaff" <j...@johnbickerstaff.com>
Sent: Thursday, May 26, 2016 1:51 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser  
Hey Jeff (or anyone interested in multi-word synonyms) here are some
potentially interesting links...

http://wiki.apache.org/solr/QueryParser (search the page for
synonum_edismax)

https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ (blog
post about what became the synonym_edissmax Query Parser)

https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/

This last was useful for lots of reasons and contains links to other
interesting, related web pages...

On Thu, May 26, 2016 at 11:45 AM, Jeff Wartes <jwar...@whitepages.com>
wrote:

> Oh, interesting. I've certainty encountered issues with multi-word
> synonyms, but I hadn't come across this. If you end up using it with a
> recent solr verison, I'd be glad to hear your experience.
>
> I haven't used it, but I am aware of one other project in this vein that
> you might be interested in looking at:
> https://github.com/LucidWorks/auto-phrase-tokenfilter
>
>
> On 5/26/16, 9:29 AM, "John Bickerstaff" <j...@johnbickerstaff.com> wrote:
>
> >Ahh - for question #3 I may have spoken too soon. This line from the
> >github repository readme suggests a way.
> >
> >Update: We have tested to run with the jar in $SOLR_HOME/lib as well, and
> >it works (Jetty).
> >
> >I'll try that and only respond back if that doesn't work.
> >
> >Questions 1 and 2 still stand of course... If anyone on the list has
> >experience in this area...
> >
> >Thanks.
> >
> >On Thu, May 26, 2016 at 10:25 AM, John Bickerstaff <
> j...@johnbickerstaff.com
> >> wrote:
> >
> >> Hi all,
> >>
> >> I'm creating a Solr Cloud that will index and search medical text.
> >> Multi-word synonyms are a pretty important factor.
> >>
> >> I find that there are some challenges around multi-word synonyms and I
> >> also found on the wiki that there is a recommended 3rd-party parser
> >> (synonym_edismax parser) created by Nolan Lawson and found here:
> >> https://github.com/healthonnet/hon-lucene-synonyms
> >>
> >> Here's the thing - the instructions on the github site involve bringing
> >> the jar file into the war file - which is not applicable any more... at
> >> least I think it's not...
> >>
> >> I have three questions:
> >>
> >> 1. Is this still a good solution for multi-word synonyms (I.e. Solr
> Cloud
> >> doesn't break it in some way)
> >> 2. Is there a tool or plug-in out there that the contributors would
> >> recommend above this one?
> >> 3. Assuming 1 = yes and 2 = no, can anyone tell me an updated procedure
> >> for bringing it in to Solr Cloud (I'm running 5.4.x)
> >>
> >> Thanks
> >>
>
>




  1   2   >