[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Marco Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805179#comment-13805179
 ] 

Marco Wong commented on SOLR-5379:
--

Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase 
will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in 
query time analyzer which emits bigram like Term(a b), which makes my existing 
tokenization logic fail?

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Marco Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805179#comment-13805179
 ] 

Marco Wong edited comment on SOLR-5379 at 10/25/13 8:56 AM:


Excuse me, for the synonym-expander.patch, when I have a ShingleFilter in query 
time analyzer which emits bigram TermQuery like Term(a b), does the updated 
SolrQueryParserBase will emitting PhraseQuery(Term(a), Term(b)), making my 
existing tokenization logic fail?


was (Author: marcowong):
Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase 
will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in 
query time analyzer which emits bigram like Term(a b), and makes my existing 
tokenization logic fail?

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Marco Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805179#comment-13805179
 ] 

Marco Wong edited comment on SOLR-5379 at 10/25/13 8:55 AM:


Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase 
will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in 
query time analyzer which emits bigram like Term(a b), and makes my existing 
tokenization logic fail?


was (Author: marcowong):
Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase 
will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in 
query time analyzer which emits bigram like Term(a b), which makes my existing 
tokenization logic fail?

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org