[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507778#comment-15507778 ] Jan Høydahl commented on SOLR-5379: --- [~daitken] and [~atuljangra] and [~mjsminkey], I'm sorry there were on replies to your questions about updating the patch. What it probably means is that noone has had the capacity or need to spend time on this. It will probably take some effort to lift the patch from 4.x to 6.x, and then get it ready for committing either as part of edismax or as a subclass. bq. What can we do to get this made official?? If you can contribute development work yourself (or your company) that would be the best. Else hire someone who can help you and/or just keep nagging here until it is done :-) > Query-time multi-word synonym expansion > --- > > Key: SOLR-5379 > URL: https://issues.apache.org/jira/browse/SOLR-5379 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Tien Nguyen Manh > Labels: multi-word, queryparser, synonym > Fix For: 4.9, 6.0 > > Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, > quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, > synonym-expander.patch > > > While dealing with synonym at query time, solr failed to work with multi-word > synonyms due to some reasons: > - First the lucene queryparser tokenizes user query by space so it split > multi-word term into two terms before feeding to synonym filter, so synonym > filter can't recognized multi-word term to do expansion > - Second, if synonym filter expand into multiple terms which contains > multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to > handle synonyms. But MultiPhraseQuery don't work with term have different > number of words. > For the first one, we can extend quoted all multi-word synonym in user query > so that lucene queryparser don't split it. There are a jira task related to > this one https://issues.apache.org/jira/browse/LUCENE-2605. > For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery > SHOULD which contains multiple PhraseQuery in case tokens stream have > multi-word synonym. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181984#comment-15181984 ] Atul Jangra commented on SOLR-5379: --- Any update on how soon this work will be included in Solr. This is very common occurrence for anyone working on search. I'm using Solr 5.5.0 and this is still a big problem. I even tried some external parser, but nothing seem to work. > Query-time multi-word synonym expansion > --- > > Key: SOLR-5379 > URL: https://issues.apache.org/jira/browse/SOLR-5379 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Tien Nguyen Manh > Labels: multi-word, queryparser, synonym > Fix For: 4.9, master > > Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, > quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, > synonym-expander.patch > > > While dealing with synonym at query time, solr failed to work with multi-word > synonyms due to some reasons: > - First the lucene queryparser tokenizes user query by space so it split > multi-word term into two terms before feeding to synonym filter, so synonym > filter can't recognized multi-word term to do expansion > - Second, if synonym filter expand into multiple terms which contains > multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to > handle synonyms. But MultiPhraseQuery don't work with term have different > number of words. > For the first one, we can extend quoted all multi-word synonym in user query > so that lucene queryparser don't split it. There are a jira task related to > this one https://issues.apache.org/jira/browse/LUCENE-2605. > For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery > SHOULD which contains multiple PhraseQuery in case tokens stream have > multi-word synonym. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163757#comment-15163757 ] Daniel Aitken commented on SOLR-5379: - Doesn't look like it from the patch, no; it's still using the extended synonym quoted query parser. We have a client who requires matching on multi-word synonyms, so I've I've compiled Solr 4.10.3 with solr-5379-version-4.10.3.patch applied and have it up and running; just testing it now, with a kind of merged config between the unit test files provided in the patch and my regular Solr 4.x configuration. Behaviour on quoted multi-word synonyms appears to work as expected across the testing I performed; this works well and would be fantastic to have available. I'm not too too concerned about drifting from edismax; unless I'm misunderstanding, wouldn't this solution maintain features from edismax by virtue of it being extended from it? It would be nice, however, not to maintain a custom query parser. So, all in all, works well for us, but I am concerned about having to essentially maintain a fork of 4.10.3 just to support this one use case. Is there a possibility of this making it into a release? Is there anything else that needs to be done with it? > Query-time multi-word synonym expansion > --- > > Key: SOLR-5379 > URL: https://issues.apache.org/jira/browse/SOLR-5379 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Tien Nguyen Manh > Labels: multi-word, queryparser, synonym > Fix For: 4.9, master > > Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, > quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, > synonym-expander.patch > > > While dealing with synonym at query time, solr failed to work with multi-word > synonyms due to some reasons: > - First the lucene queryparser tokenizes user query by space so it split > multi-word term into two terms before feeding to synonym filter, so synonym > filter can't recognized multi-word term to do expansion > - Second, if synonym filter expand into multiple terms which contains > multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to > handle synonyms. But MultiPhraseQuery don't work with term have different > number of words. > For the first one, we can extend quoted all multi-word synonym in user query > so that lucene queryparser don't split it. There are a jira task related to > this one https://issues.apache.org/jira/browse/LUCENE-2605. > For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery > SHOULD which contains multiple PhraseQuery in case tokens stream have > multi-word synonym. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152678#comment-15152678 ] Mary Jo Sminkey commented on SOLR-5379: --- Also sounds like the code to incorporate it into edismax was never done? > Query-time multi-word synonym expansion > --- > > Key: SOLR-5379 > URL: https://issues.apache.org/jira/browse/SOLR-5379 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Tien Nguyen Manh > Labels: multi-word, queryparser, synonym > Fix For: 4.9, master > > Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, > quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, > synonym-expander.patch > > > While dealing with synonym at query time, solr failed to work with multi-word > synonyms due to some reasons: > - First the lucene queryparser tokenizes user query by space so it split > multi-word term into two terms before feeding to synonym filter, so synonym > filter can't recognized multi-word term to do expansion > - Second, if synonym filter expand into multiple terms which contains > multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to > handle synonyms. But MultiPhraseQuery don't work with term have different > number of words. > For the first one, we can extend quoted all multi-word synonym in user query > so that lucene queryparser don't split it. There are a jira task related to > this one https://issues.apache.org/jira/browse/LUCENE-2605. > For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery > SHOULD which contains multiple PhraseQuery in case tokens stream have > multi-word synonym. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152649#comment-15152649 ] Mary Jo Sminkey commented on SOLR-5379: --- This *still* isn't available in Solr 5?? What can we do to get this made official?? > Query-time multi-word synonym expansion > --- > > Key: SOLR-5379 > URL: https://issues.apache.org/jira/browse/SOLR-5379 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Tien Nguyen Manh > Labels: multi-word, queryparser, synonym > Fix For: 4.9, master > > Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, > quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, > synonym-expander.patch > > > While dealing with synonym at query time, solr failed to work with multi-word > synonyms due to some reasons: > - First the lucene queryparser tokenizes user query by space so it split > multi-word term into two terms before feeding to synonym filter, so synonym > filter can't recognized multi-word term to do expansion > - Second, if synonym filter expand into multiple terms which contains > multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to > handle synonyms. But MultiPhraseQuery don't work with term have different > number of words. > For the first one, we can extend quoted all multi-word synonym in user query > so that lucene queryparser don't split it. There are a jira task related to > this one https://issues.apache.org/jira/browse/LUCENE-2605. > For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery > SHOULD which contains multiple PhraseQuery in case tokens stream have > multi-word synonym. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726326#comment-14726326 ] Otis Gospodnetic commented on SOLR-5379: There is a patch for 4.10.3, but it was not committed, so this is still not available in Solr AFAIK. Would be great to get this into 5.x. > Query-time multi-word synonym expansion > --- > > Key: SOLR-5379 > URL: https://issues.apache.org/jira/browse/SOLR-5379 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Tien Nguyen Manh > Labels: multi-word, queryparser, synonym > Fix For: 4.9, Trunk > > Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, > quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, > synonym-expander.patch > > > While dealing with synonym at query time, solr failed to work with multi-word > synonyms due to some reasons: > - First the lucene queryparser tokenizes user query by space so it split > multi-word term into two terms before feeding to synonym filter, so synonym > filter can't recognized multi-word term to do expansion > - Second, if synonym filter expand into multiple terms which contains > multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to > handle synonyms. But MultiPhraseQuery don't work with term have different > number of words. > For the first one, we can extend quoted all multi-word synonym in user query > so that lucene queryparser don't split it. There are a jira task related to > this one https://issues.apache.org/jira/browse/LUCENE-2605. > For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery > SHOULD which contains multiple PhraseQuery in case tokens stream have > multi-word synonym. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635899#comment-14635899 ] Timothy Garafola commented on SOLR-5379: Is there a status on this issue? Did it get moved forward to 5.x? Is it available in 4.10? Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.9, Trunk Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318083#comment-14318083 ] Rafał Kuć commented on SOLR-5379: - Sure, for now I don't see a reason why we shouldn't get that into edismax :) Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.9, Trunk Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318052#comment-14318052 ] Jan Høydahl commented on SOLR-5379: --- [~gro] Do you agree to fold this into edismax? I hate to fragment into another parser, which over time will diverge in features. Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.9, Trunk Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, quoted.patch, solr-5379-version-4.10.3.patch, synonym-expander-4_8_1.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312536#comment-14312536 ] Rafał Kuć commented on SOLR-5379: - I have the code updated to Solr 4.10.3 and I'm running tests now. I see a few issues with the code right now (i.e. some static, magic string objects, because some classes were moved outside of Lucene core). I'll attach the updated patch tomorrow, but I'm not sure if there will be another release from 4.x branch. So I guess the easiest way would be to get the code polished for 5.x branch and try committing there. What do you think? Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.9, Trunk Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, quoted.patch, synonym-expander-4_8_1.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304161#comment-14304161 ] Otis Gospodnetic commented on SOLR-5379: bq. I am sure there is, but there are no working patches for 4.10 or 5.x thus far. Right. What I was trying to ask is whether any of the active Solr committers wants to commit this. If there is no will to commit, I'd rather keep things simple on our end ignore this issue. But there is a will to commit, I'd love to see this in Solr, as would 30+ other watchers, I imagine. Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.9, Trunk Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, quoted.patch, synonym-expander-4_8_1.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14303081#comment-14303081 ] Markus Jelsma commented on SOLR-5379: - I am sure there is, but there are no working patches for 4.10 or 5.x thus far. Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.9, Trunk Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, quoted.patch, synonym-expander-4_8_1.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304336#comment-14304336 ] Jan Høydahl commented on SOLR-5379: --- +1 to get some of this in. I have the desire but not the cycles right now. Perhaps your happy customer could help drive this? Just looked briefly at the patches.. *disclaimer: I did not apply and test this yet* * I would expect a ton of new unit tests for {{synonym-expander.patch}} but cannot find? * Why create another subclass of ExtendedDismax for this? If going into core, fold the features into edismax? The patch will be smaller too. * I cannot see a test for configuring custom {{synonymAnalyzers}}. Also, it should refer to schema fieldTypes instead of adding to qparser config - in the same way e.g. Suggesters do Probably the work could be split up - first add more test coverage to the {{synonym-expander}} part and commit it. Then fold the quoting-stuff into standard edismax and commit (this part is less risky since it is back-compat if you don't use the new params). Is [~tiennm] still around? Other users ot the patch who are willing to step in and improve it? Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.9, Trunk Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, quoted.patch, synonym-expander-4_8_1.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301433#comment-14301433 ] Otis Gospodnetic commented on SOLR-5379: Is there any interest in committing this to 4.x or 5.x? We have a client at Sematext who needs query-time synonym support for their Solr 4.x setup. So we can make sure this patch works for 4.x. If any of the Solr developers wants to commit this to 5.x, please leave a comment here. Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.9, Trunk Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, quoted.patch, synonym-expander-4_8_1.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191678#comment-14191678 ] Jan Høydahl commented on SOLR-5379: --- Interest in picking this up again? [~rpialum], I have not looked at your patches yet, but am interested in facilitating / reviewing Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.9, Trunk Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, quoted.patch, synonym-expander-4_8_1.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191824#comment-14191824 ] Jeremy Anderson commented on SOLR-5379: --- Unfortunately, I'm swamped with other stuff on my plate. Thinking back, I think I abandoned this approach and instead took Nolan Lawson's route (see SOLR-4381). I don't recall how mature and stable I had gotten my patches before switching paths. Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.9, Trunk Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, quoted.patch, synonym-expander-4_8_1.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020221#comment-14020221 ] Jeremy Anderson commented on SOLR-5379: --- I'm in the process of trying to get this logic ported into the 4.8.1 Released Tag. I believe I've gotten the code ported over, but am having problems getting the unit test to run to confirm the correctness of the port. The main reason is the differences in the conf/solrconfig.xml and conf/schema.xml files that exist in the root and I'm guessing those used by Tien when the 4.5.0 patch was created. I'm still a SOLR novice so I'm not quite sure how to properly replicate the schema and configuration settings to get the unit test to run. I'm going to attach patch files shortly for the 4.8.1 code base along with the current stubbed out configuration files. Any help anyone can provide would be greatly appreciated. My end goal is to hopefully be able to get the multi-term synonym expansion logic to work with a 4.8.1 deployment where we're using an extended version of the SolrQueryParser. (I'm not sure if the multi-term synonym logic is only usable with this patch by the new SynonymQuotedDismaxQParser or existing DismaxQarsers). Notes on 4.8.1 port: * There is now 2 parsers usable by the FSTSynonymFilterFactory: SolrSynonymParser WordnetSynonymParser. The later of which I'm not sure if any additional logic needs to be implemented for proper usage of the tokenize parameter. * All of the logic implemented in SolrQueryParserBase from 4.5.0 has now been moved into the utility QueryBuilder class. Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.9, 5.0 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13893368#comment-13893368 ] Markus Jelsma commented on SOLR-5379: - Ok, it seems i had some bad jars laying around messsing things up if a specific token filter was in use. Anyway, this patch works fine from single word to multi word but not the other way around. I have a 4.5.0 check out here with just this patch. Using the example schema and data and the usual [seabiscuit,sea biscit,biscit] syns: http://localhost:8983/solr/select?defType=edismaxqf=namerows=0debugQuery=trueq= {code} q=biscit = (+DisjunctionMaxQuery(((name:seabiscuit name:sea biscit name:biscit/no_coord q=seabiscuit = (+DisjunctionMaxQuery(((name:seabiscuit name:sea biscit name:biscit/no_coord q=sea biscit = (+(DisjunctionMaxQuery((name:sea)) DisjunctionMaxQuery(((name:seabiscuit name:sea biscit name:biscit)/no_coord {code} This is all very nice but, if we change the syns from [seabiscuit,sea biscit,biscit] to [seabiscuit,sea biscit] it no longer works for {code} q=sea biscit = (+(DisjunctionMaxQuery((name:sea)) DisjunctionMaxQuery((name:biscit/no_coord {code} [~tiennm] So i assume this is clearly not the desired behaviour right? Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13893393#comment-13893393 ] Markus Jelsma commented on SOLR-5379: - By the way: using the SynonymQuotedDismaxQParser doesn't change anything. Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894248#comment-13894248 ] Tien Nguyen Manh commented on SOLR-5379: [~markus17] It is not the desired behavious!. your result above in first example with sync [seabiscuit,sea biscit,biscit] q=sea biscit = (+(DisjunctionMaxQuery((name:sea)) DisjunctionMaxQuery(((name:seabiscuit name:sea biscit name:biscit)/no_coord seem the default behaviour (without the patch). After appling the patch, it should be the same result for all three queries q=biscit, q=seabiscuit, q=sea biscit Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13892010#comment-13892010 ] Eric Bus commented on SOLR-5379: Has anyone modified this patch to work on 4.6.1? I tried to do a manual merge for the second patch. But a lot has changed in the SolrQueryParserBase.java file. Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13892203#comment-13892203 ] Markus Jelsma commented on SOLR-5379: - [~nolanlawson] is the outcome you describe desired behaviour? I don't really believe it is. For synonyms [a b,x y] and q=a b you get PhraseQuery(content:x y a b). While phrase a b and x y would ordinarily match some documents, x y a b will never match. Or is this supposed to expand syns at index time too? Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866729#comment-13866729 ] Markus Jelsma commented on SOLR-5379: - Yes +1 Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867202#comment-13867202 ] Nolan Lawson commented on SOLR-5379: +1 as well. Tien's patch also seems to be a better candidate seeing as they included Java tests, whereas my tests are in Python 'cuz I was lazy. :) Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865335#comment-13865335 ] Markus Jelsma commented on SOLR-5379: - Nolan, Jan, both of you have extensive knowledge about the one you worked on hosted on github. How do you compare features? I've checked your issue list and there are no new issues coming in and a lot have been resolved already, looks like that one is much more mature and flexible/configurable. Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866095#comment-13866095 ] Jan Høydahl commented on SOLR-5379: --- What I like about Nolan's solution is that you control synonyms outside of analysis - for all fields,although there are pros and cons with this. Also that you can deboost the synonyms and easily turn them on/off as you like. What I like about Tien's patch is that it solves exactly the problem at hand without introducing need for completely new configurations or concepts, and that it can work with other Qparsers as well. In that respect it is perhaps better suited for early inclusion in Solr, then we can look at bringing in be best from 4381 later? Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866221#comment-13866221 ] Otis Gospodnetic commented on SOLR-5379: Jan: +1 Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864147#comment-13864147 ] Markus Jelsma commented on SOLR-5379: - How does this patch handle boosts? Are the synonym and the original keywords boosted equally? Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864149#comment-13864149 ] Ahmet Arslan commented on SOLR-5379: Assume synonyms are {code} usa, united states of america {code} What happens if I fire the following sloppy phrase query *president usa~5* Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864402#comment-13864402 ] Nolan Lawson commented on SOLR-5379: [~markus17]: They're boosted equally. It was the subject of [a bug|https://github.com/healthonnet/hon-lucene-synonyms/issues/31]. [~iorixxx]: I just tested it out now. I got: {code} (+(DisjunctionMaxQuery((text:president usa~5)) (((+DisjunctionMaxQuery((text:president united states of america~5)))/no_coord/no_coord // parsedQuery +((text:president usa~5) ((+(text:president united states of america~5 // parsedQuery.toString() {code} Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863481#comment-13863481 ] Otis Gospodnetic commented on SOLR-5379: Our customers have been using this in production for about half a year now without issues. Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862536#comment-13862536 ] Jan Høydahl commented on SOLR-5379: --- So what is the next step with this one? Anyone who have tested it and have comments? Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.6, 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817133#comment-13817133 ] Markus Jelsma commented on SOLR-5379: - Oh i interpreted your comment as if you have tested it against the other patches linked to this one. Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Nguyen Manh Tien Labels: multi-word, queryparser, synonym Fix For: 4.5.1, 4.6 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817174#comment-13817174 ] Otis Gospodnetic commented on SOLR-5379: [~bsteele] - maybe you had your colleagues test other patches, like SOLR-4381? Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Nguyen Manh Tien Labels: multi-word, queryparser, synonym Fix For: 4.5.1, 4.6 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816243#comment-13816243 ] Bill Steele commented on SOLR-5379: --- Hi Markus, Which other multi-word synonym patches are you referring to? We tested the Solr functionality built into Solr 4.2.1 which didn't support multiple word synonyms, and as far as I know that issue still exists in the latest. This patch fixes that issue for us. Not aware of other alternatives available. Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Nguyen Manh Tien Labels: multi-word, queryparser, synonym Fix For: 4.5.1, 4.6 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814848#comment-13814848 ] Markus Jelsma commented on SOLR-5379: - Bill - did you also test the other multiword-synonym patches? Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Nguyen Manh Tien Labels: multi-word, queryparser, synonym Fix For: 4.5.1, 4.6 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814281#comment-13814281 ] Bill Steele commented on SOLR-5379: --- We found this to be much more useful code for multiword synonyms. We ran some tests, and when having a synonym set such as: seabiscuit, sea biscuit, sea biscit Search on the following: seabiscuit article Returned matches with the following terms Sea biscit article Sea biscuit article Seabiscuit article Biscuit Sea article Sea article Biscit article With this patch, the above search query just returned the terms: Sea biscit article Sea biscuit article Seabiscuit article Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Nguyen Manh Tien Labels: multi-word, queryparser, synonym Fix For: 4.5.1, 4.6 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805179#comment-13805179 ] Marco Wong commented on SOLR-5379: -- Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in query time analyzer which emits bigram like Term(a b), which makes my existing tokenization logic fail? Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Nguyen Manh Tien Labels: multi-word, queryparser, synonym Fix For: 4.5.1, 4.6 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805188#comment-13805188 ] Nguyen Manh Tien commented on SOLR-5379: Yes, it will emit PhraseQuery(Term(a), Term(b)). We must additional check to only tokenize term when it is synonym. I will change the patch. Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Nguyen Manh Tien Labels: multi-word, queryparser, synonym Fix For: 4.5.1, 4.6 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803086#comment-13803086 ] Otis Gospodnetic commented on SOLR-5379: [~tiennm] How does this diff from SOLR-4381? Which cases does SOLR-4381 not handle that this patch handles? Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Nguyen Manh Tien Labels: multi-word, queryparser, synonym Fix For: 4.5.1, 4.6 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803199#comment-13803199 ] Otis Gospodnetic commented on SOLR-5379: My understanding of how this synonym expander (the synonym-expander.patch) works is: Assume synonyms are: {code} Seabiscuit, Sea biscit, Biscit {code} For query Seabiscuit article, the regular edismax will construct a MultiPhraseQuery like (Seebiscuit|Sea|biscit, biscit, article). Instead of that, this patch rewrites the query differently: PhraseQuery(Seabiscit article) OR PhraseQuery(Sea biscit article) OR PhraseQuery(biscit article) Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Nguyen Manh Tien Labels: multi-word, queryparser, synonym Fix For: 4.5.1, 4.6 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803652#comment-13803652 ] Nguyen Manh Tien commented on SOLR-5379: [~otis] The difference are SOLR-4381 is an extension of EDismax so it only work for that query parser, my patch is a patch to SolrQueryParserBase it work for any query parser SOLR-4381 rewrite query into lattice (all synonym combination) so it need to parse N modified query, my patch is applied when we read tokenstream to build Lucene Query, so it still parse query 1 time and we can still optimize my work to make the result Lucene Query compacted by combine both MultiPhraseQuery and PhraseQuery, so the Lucene Query of my patch is smaller than SOLR-4381 Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Nguyen Manh Tien Labels: multi-word, queryparser, synonym Fix For: 4.5.1, 4.6 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803656#comment-13803656 ] Nguyen Manh Tien commented on SOLR-5379: That corrected [~otis] Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Nguyen Manh Tien Labels: multi-word, queryparser, synonym Fix For: 4.5.1, 4.6 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org