[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2016-09-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507778#comment-15507778
 ] 

Jan Høydahl commented on SOLR-5379:
---

[~daitken] and [~atuljangra] and [~mjsminkey], I'm sorry there were on replies 
to your questions about updating the patch. What it probably means is that 
noone has had the capacity or need to spend time on this. It will probably take 
some effort to lift the patch from 4.x to 6.x, and then get it ready for 
committing either as part of edismax or as a subclass.

bq. What can we do to get this made official??
If you can contribute development work yourself (or your company) that would be 
the best. Else hire someone who can help you and/or just keep nagging here 
until it is done :-)

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Tien Nguyen Manh
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.9, 6.0
>
> Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, 
> quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, 
> synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2016-03-05 Thread Atul Jangra (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181984#comment-15181984
 ] 

Atul Jangra commented on SOLR-5379:
---

Any update on how soon this work will be included in Solr. This is very common 
occurrence for anyone working on search. 
I'm using Solr 5.5.0 and this is still a big problem. I even tried some 
external parser, but nothing seem to work. 

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Tien Nguyen Manh
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.9, master
>
> Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, 
> quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, 
> synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2016-02-24 Thread Daniel Aitken (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163757#comment-15163757
 ] 

Daniel Aitken commented on SOLR-5379:
-

Doesn't look like it from the patch, no; it's still using the extended synonym 
quoted query parser.

We have a client who requires matching on multi-word synonyms, so I've I've 
compiled Solr 4.10.3 with solr-5379-version-4.10.3.patch applied and have it up 
and running; just testing it now, with a kind of merged config between the unit 
test files provided in the patch and my regular Solr 4.x configuration.

Behaviour on quoted multi-word synonyms appears to work as expected across the 
testing I performed; this works well and would be fantastic to have available. 
I'm not too too concerned about drifting from edismax; unless I'm 
misunderstanding, wouldn't this solution maintain features from edismax by 
virtue of it being extended from it? It would be nice, however, not to maintain 
a custom query parser.

So, all in all, works well for us, but I am concerned about having to 
essentially maintain a fork of 4.10.3 just to support this one use case. Is 
there a possibility of this making it into a release? Is there anything else 
that needs to be done with it?

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Tien Nguyen Manh
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.9, master
>
> Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, 
> quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, 
> synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2016-02-18 Thread Mary Jo Sminkey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152678#comment-15152678
 ] 

Mary Jo Sminkey commented on SOLR-5379:
---

Also sounds like the code to incorporate it into edismax was never done?  

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Tien Nguyen Manh
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.9, master
>
> Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, 
> quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, 
> synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2016-02-18 Thread Mary Jo Sminkey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152649#comment-15152649
 ] 

Mary Jo Sminkey commented on SOLR-5379:
---

This *still* isn't available in Solr 5?? What can we do to get this made 
official?? 

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Tien Nguyen Manh
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.9, master
>
> Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, 
> quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, 
> synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2015-09-01 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726326#comment-14726326
 ] 

Otis Gospodnetic commented on SOLR-5379:


There is a patch for 4.10.3, but it was not committed, so this is still not 
available in Solr AFAIK.  Would be great to get this into 5.x.

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Tien Nguyen Manh
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.9, Trunk
>
> Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, 
> quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, 
> synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2015-07-21 Thread Timothy Garafola (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635899#comment-14635899
 ] 

Timothy Garafola commented on SOLR-5379:


Is there a status on this issue?  Did it get moved forward to 5.x?  Is it 
available in 4.10?


 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.9, Trunk

 Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, 
 quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, 
 synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2015-02-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318083#comment-14318083
 ] 

Rafał Kuć commented on SOLR-5379:
-

Sure, for now I don't see a reason why we shouldn't get that into edismax :)

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.9, Trunk

 Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, 
 quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, 
 synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2015-02-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318052#comment-14318052
 ] 

Jan Høydahl commented on SOLR-5379:
---

[~gro] Do you agree to fold this into edismax? I hate to fragment into another 
parser, which over time will diverge in features.

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.9, Trunk

 Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, 
 quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, 
 synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2015-02-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312536#comment-14312536
 ] 

Rafał Kuć commented on SOLR-5379:
-

I have the code updated to Solr 4.10.3 and I'm running tests now. I see a few 
issues with the code right now (i.e. some static, magic string objects, because 
some classes were moved outside of Lucene core). I'll attach the updated patch 
tomorrow, but I'm not sure if there will be another release from 4.x branch. So 
I guess the easiest way would be to get the code polished for 5.x branch and 
try committing there. What do you think?

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.9, Trunk

 Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, 
 quoted.patch, synonym-expander-4_8_1.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2015-02-03 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304161#comment-14304161
 ] 

Otis Gospodnetic commented on SOLR-5379:


bq. I am sure there is, but there are no working patches for 4.10 or 5.x thus 
far.

Right.  What I was trying to ask is whether any of the active Solr committers 
wants to commit this.  If there is no will to commit, I'd rather keep things 
simple on our end ignore this issue.  But there is a will to commit, I'd love 
to see this in Solr, as would 30+ other watchers, I imagine.

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.9, Trunk

 Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, 
 quoted.patch, synonym-expander-4_8_1.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2015-02-03 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14303081#comment-14303081
 ] 

Markus Jelsma commented on SOLR-5379:
-

I am sure there is, but there are no working patches for 4.10 or 5.x thus far.


 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.9, Trunk

 Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, 
 quoted.patch, synonym-expander-4_8_1.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2015-02-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304336#comment-14304336
 ] 

Jan Høydahl commented on SOLR-5379:
---

+1 to get some of this in. I have the desire but not the cycles right now. 
Perhaps your happy customer could help drive this?

Just looked briefly at the patches.. *disclaimer: I did not apply and test this 
yet*
* I would expect a ton of new unit tests for {{synonym-expander.patch}} but 
cannot find?
* Why create another subclass of ExtendedDismax for this? If going into core, 
fold the features into edismax? The patch will be smaller too.
* I cannot see a test for configuring custom {{synonymAnalyzers}}. Also, it 
should refer to schema fieldTypes instead of adding to qparser config - in the 
same way e.g. Suggesters do

Probably the work could be split up - first add more test coverage to the 
{{synonym-expander}} part and commit it. Then fold the quoting-stuff into 
standard edismax and commit (this part is less risky since it is back-compat if 
you don't use the new params).

Is [~tiennm] still around? Other users ot the patch who are willing to step in 
and improve it?

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.9, Trunk

 Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, 
 quoted.patch, synonym-expander-4_8_1.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2015-02-02 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301433#comment-14301433
 ] 

Otis Gospodnetic commented on SOLR-5379:


Is there any interest in committing this to 4.x or 5.x?  We have a client at 
Sematext who needs query-time synonym support for their Solr 4.x setup.  So we 
can make sure this patch works for 4.x.  If any of the Solr developers wants to 
commit this to 5.x, please leave a comment here.

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.9, Trunk

 Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, 
 quoted.patch, synonym-expander-4_8_1.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-10-31 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191678#comment-14191678
 ] 

Jan Høydahl commented on SOLR-5379:
---

Interest in picking this up again? [~rpialum], I have not looked at your 
patches yet, but am interested in facilitating / reviewing

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.9, Trunk

 Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, 
 quoted.patch, synonym-expander-4_8_1.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-10-31 Thread Jeremy Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191824#comment-14191824
 ] 

Jeremy Anderson commented on SOLR-5379:
---

Unfortunately, I'm swamped with other stuff on my plate.  Thinking back, I 
think I abandoned this approach and instead took Nolan Lawson's route (see 
SOLR-4381).  I don't recall how mature and stable I had gotten my patches 
before switching paths.

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.9, Trunk

 Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, 
 quoted.patch, synonym-expander-4_8_1.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-06-06 Thread Jeremy Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020221#comment-14020221
 ] 

Jeremy Anderson commented on SOLR-5379:
---

I'm in the process of trying to get this logic ported into the 4.8.1 Released 
Tag.  I believe I've gotten the code ported over, but am having problems 
getting the unit test to run to confirm the correctness of the port.  The main 
reason is the differences in the conf/solrconfig.xml and conf/schema.xml files 
that exist in the root and I'm guessing those used by Tien when the 4.5.0 patch 
was created.  

I'm still a SOLR novice so I'm not quite sure how to properly replicate the 
schema and configuration settings to get the unit test to run.  I'm going to 
attach patch files shortly for the 4.8.1 code base along with the current 
stubbed out configuration files.

Any help anyone can provide would be greatly appreciated.  My end goal is to 
hopefully be able to get the multi-term synonym expansion logic to work with a 
4.8.1 deployment where we're using an extended version of the SolrQueryParser.  
(I'm not sure if the multi-term synonym logic is only usable with this patch by 
the new SynonymQuotedDismaxQParser or existing DismaxQarsers).

Notes on 4.8.1 port:
* There is now 2 parsers usable by the FSTSynonymFilterFactory: 
SolrSynonymParser  WordnetSynonymParser.  The later of which I'm not sure if 
any additional logic needs to be implemented for proper usage of the tokenize 
parameter.
* All of the logic implemented in SolrQueryParserBase from 4.5.0 has now been 
moved into the utility QueryBuilder class.


 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.9, 5.0

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-02-06 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13893368#comment-13893368
 ] 

Markus Jelsma commented on SOLR-5379:
-

Ok, it seems i had some bad jars laying around messsing things up if a specific 
token filter was in use. Anyway, this patch works fine from single word to 
multi word but not the other way around. 

I have a 4.5.0 check out here with just this patch. Using the example schema 
and data and the usual [seabiscuit,sea biscit,biscit] syns:
http://localhost:8983/solr/select?defType=edismaxqf=namerows=0debugQuery=trueq=

{code}
q=biscit = (+DisjunctionMaxQuery(((name:seabiscuit name:sea biscit 
name:biscit/no_coord
q=seabiscuit = (+DisjunctionMaxQuery(((name:seabiscuit name:sea biscit 
name:biscit/no_coord
q=sea biscit = (+(DisjunctionMaxQuery((name:sea)) 
DisjunctionMaxQuery(((name:seabiscuit name:sea biscit 
name:biscit)/no_coord
{code}

This is all very nice but, if we change the syns from [seabiscuit,sea 
biscit,biscit] to [seabiscuit,sea biscit] it no longer works for
{code}
q=sea biscit = (+(DisjunctionMaxQuery((name:sea)) 
DisjunctionMaxQuery((name:biscit/no_coord
{code}

[~tiennm] So i assume this is clearly not the desired behaviour right? 



 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-02-06 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13893393#comment-13893393
 ] 

Markus Jelsma commented on SOLR-5379:
-

By the way: using the SynonymQuotedDismaxQParser doesn't change anything.

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-02-06 Thread Tien Nguyen Manh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894248#comment-13894248
 ] 

Tien Nguyen Manh commented on SOLR-5379:


[~markus17] It is not the desired behavious!.

your result above in first example with sync [seabiscuit,sea biscit,biscit]
q=sea biscit = (+(DisjunctionMaxQuery((name:sea)) 
DisjunctionMaxQuery(((name:seabiscuit name:sea biscit 
name:biscit)/no_coord

seem the default behaviour (without the patch).
After appling the patch, it should be the same result for all three queries 
q=biscit, q=seabiscuit, q=sea biscit


 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-02-05 Thread Eric Bus (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13892010#comment-13892010
 ] 

Eric Bus commented on SOLR-5379:


Has anyone modified this patch to work on 4.6.1? I tried to do a manual merge 
for the second patch. But a lot has changed in the SolrQueryParserBase.java 
file.

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-02-05 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13892203#comment-13892203
 ] 

Markus Jelsma commented on SOLR-5379:
-

[~nolanlawson] is the outcome you describe desired behaviour? I don't really 
believe it is. For synonyms [a b,x y] and q=a b you get 
PhraseQuery(content:x y a b). While phrase a b and x y would ordinarily 
match some documents, x y a b will never match. Or is this supposed to expand 
syns at index time too?

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-09 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866729#comment-13866729
 ] 

Markus Jelsma commented on SOLR-5379:
-

Yes +1

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-09 Thread Nolan Lawson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867202#comment-13867202
 ] 

Nolan Lawson commented on SOLR-5379:


+1 as well. Tien's patch also seems to be a better candidate seeing as they 
included Java tests, whereas my tests are in Python 'cuz I was lazy.  :)

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-08 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865335#comment-13865335
 ] 

Markus Jelsma commented on SOLR-5379:
-

Nolan, Jan, both of you have extensive knowledge about the one you worked on 
hosted on github. How do you compare features? I've checked your issue list and 
there are no new issues coming in and a lot have been resolved already, looks 
like that one is much more mature and flexible/configurable. 

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866095#comment-13866095
 ] 

Jan Høydahl commented on SOLR-5379:
---

What I like about Nolan's solution is that you control synonyms outside of 
analysis - for all fields,although there are pros and cons with this. Also that 
you can deboost the synonyms and easily turn them on/off as you like.

What I like about Tien's patch is that it solves exactly the problem at hand 
without introducing need for completely new configurations or concepts, and 
that it can work with other Qparsers as well. In that respect it is perhaps 
better suited for early inclusion in Solr, then we can look at bringing in be 
best from 4381 later?

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-08 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866221#comment-13866221
 ] 

Otis Gospodnetic commented on SOLR-5379:


Jan: +1

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-07 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864147#comment-13864147
 ] 

Markus Jelsma commented on SOLR-5379:
-

How does this patch handle boosts?  Are the synonym and the original keywords 
boosted equally?

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-07 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864149#comment-13864149
 ] 

Ahmet Arslan commented on SOLR-5379:


Assume synonyms are {code}  usa, united states of america {code} What happens 
if I fire the following sloppy phrase query  *president usa~5*

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-07 Thread Nolan Lawson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864402#comment-13864402
 ] 

Nolan Lawson commented on SOLR-5379:


[~markus17]: They're boosted equally.  It was the subject of [a 
bug|https://github.com/healthonnet/hon-lucene-synonyms/issues/31].

[~iorixxx]: I just tested it out now.  I got:

{code}
(+(DisjunctionMaxQuery((text:president usa~5)) 
(((+DisjunctionMaxQuery((text:president united states of 
america~5)))/no_coord/no_coord // parsedQuery
+((text:president usa~5) ((+(text:president united states of america~5 
// parsedQuery.toString()
{code}

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-06 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863481#comment-13863481
 ] 

Otis Gospodnetic commented on SOLR-5379:


Our customers have been using this in production for about half a year now 
without issues.

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862536#comment-13862536
 ] 

Jan Høydahl commented on SOLR-5379:
---

So what is the next step with this one? Anyone who have tested it and have 
comments?

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.6, 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-11-08 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817133#comment-13817133
 ] 

Markus Jelsma commented on SOLR-5379:
-

Oh i interpreted your comment as if you have tested it against the other 
patches linked to this one.

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-11-08 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817174#comment-13817174
 ] 

Otis Gospodnetic commented on SOLR-5379:


[~bsteele] - maybe you had your colleagues test other patches, like SOLR-4381?

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-11-07 Thread Bill Steele (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816243#comment-13816243
 ] 

Bill Steele commented on SOLR-5379:
---

Hi Markus,

Which other multi-word synonym patches are you referring to?  We tested the 
Solr functionality built into Solr 4.2.1 which didn't support multiple word 
synonyms, and as far as I know that issue still exists in the latest.  This 
patch fixes that issue for us.  Not aware of other alternatives available.

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-11-06 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814848#comment-13814848
 ] 

Markus Jelsma commented on SOLR-5379:
-

Bill - did you also test the other multiword-synonym patches?

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-11-05 Thread Bill Steele (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814281#comment-13814281
 ] 

Bill Steele commented on SOLR-5379:
---

We found this to be much more useful code for multiword synonyms.  We ran some 
tests, and when having a synonym set such as:

seabiscuit, sea biscuit, sea biscit

Search on the following:

seabiscuit article

Returned matches with the following terms

Sea biscit article
Sea biscuit article
Seabiscuit article
Biscuit Sea article
Sea article
Biscit article

With this patch, the above search query just returned the terms:

Sea biscit article
Sea biscuit article
Seabiscuit article



 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Marco Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805179#comment-13805179
 ] 

Marco Wong commented on SOLR-5379:
--

Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase 
will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in 
query time analyzer which emits bigram like Term(a b), which makes my existing 
tokenization logic fail?

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Nguyen Manh Tien (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805188#comment-13805188
 ] 

Nguyen Manh Tien commented on SOLR-5379:


Yes, it will emit PhraseQuery(Term(a), Term(b)).
We must additional check to only tokenize term when it is synonym.
I will change the patch.

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-23 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803086#comment-13803086
 ] 

Otis Gospodnetic commented on SOLR-5379:


[~tiennm] How does this diff from SOLR-4381?  Which cases does SOLR-4381 not 
handle that this patch handles?

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-23 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803199#comment-13803199
 ] 

Otis Gospodnetic commented on SOLR-5379:


My understanding of how this synonym expander (the synonym-expander.patch) 
works is:

Assume synonyms are:
{code}
Seabiscuit, Sea biscit, Biscit
{code}

For query Seabiscuit article, the regular edismax will construct a 
MultiPhraseQuery like (Seebiscuit|Sea|biscit, biscit, article).

Instead of that, this patch rewrites the query differently:
PhraseQuery(Seabiscit article) OR PhraseQuery(Sea biscit article) OR 
PhraseQuery(biscit article)


 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-23 Thread Nguyen Manh Tien (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803652#comment-13803652
 ] 

Nguyen Manh Tien commented on SOLR-5379:


[~otis] The difference are
 SOLR-4381 is an extension of EDismax so it only work for that query parser, my 
patch is a patch to SolrQueryParserBase it work for any query parser
 SOLR-4381 rewrite query into lattice (all synonym combination) so it need to 
parse N modified query, my patch is applied when we read tokenstream to build 
Lucene Query, so it still parse query 1 time and
we can still optimize my work to make the result Lucene Query compacted by 
combine both MultiPhraseQuery and PhraseQuery, so the Lucene Query of my patch 
is smaller than SOLR-4381


 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-23 Thread Nguyen Manh Tien (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803656#comment-13803656
 ] 

Nguyen Manh Tien commented on SOLR-5379:


That corrected [~otis]

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org