[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662397#comment-16662397 ] Michael Gibney commented on LUCENE-8531: Thanks [~thetaphi], and I agree that it would be a separate issue (^"Would it be worth opening a new issue to consider introducing the ability to specifically request construction of {{SpanNearQuery}} and/or {{inOrder=true}} behavior?"). I've created LUCENE-8543, so discussion can move there if anyone's interested. I also created LUCENE-8544, proposing the addition of support for {{(Multi)PhraseQuery}} phrase semantics in {{SpanNearQuery}}. I think it should be achievable, at least in the context of a proposed patch for LUCENE-7398. > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > Fix For: 7.6, master (8.0) > > Attachments: LUCENE-8531.patch > > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662277#comment-16662277 ] Uwe Schindler commented on LUCENE-8531: --- Yeah, but that's a separate issue, IMHO. This issue just fixes the bug that existed for graph synonyms, which were not correct in comparison for queries without synonyms and were violating the documentation and behaviour of Lucene since version 1.0. > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > Fix For: 7.6, master (8.0) > > Attachments: LUCENE-8531.patch > > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661087#comment-16661087 ] Michael Gibney commented on LUCENE-8531: > I think we should keep the default behavior as is. You can still override > QueryBuilder#analyzeGraphPhrase to apply a different logic on your side if > you want. Certainly agreed the default behavior should be left as-is. I'm content with the flexibility to override, but my suggestion was based on a sense that the desire to support {{inOrder=true}} could be a pretty common use case. The API does specify "phrase", but with a lower-case "p", does this necessarily imply that exclusively {{PhraseQuery}} semantics _should_ be supported? It's the de facto case that {{PhraseQuery}} semantics _have been_ supported, so it definitely makes sense for that to continue to be the default – but I don't think it'd be unreasonable to add configurable stock support for {{inOrder=true}}. If such support were to be added, {{QueryBuilder}} would seem like a logical place to do it, and since the logic necessary to implement is already here (in {{analyzeGraphPhrase}}), it should be a trivial addition. I'm thinking something along the lines of splitting the {{SpanNearQuery}} part of {{analyzeGraphPhrase (}}everything after the "{{if (phraseSlop > 0)}}" shortcircuit) into its own method. Even if split into a protected method, this would allow any override of {{analyzeGraphPhrase}} to more cleanly leverage the existing logic for building {{SpanNearQuery}}. I'm just explaining my thinking here; I guess the decision ultimately depends on how general a use case folks consider {{inOrder=true}} to be. > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > Fix For: 7.6, master (8.0) > > Attachments: LUCENE-8531.patch > > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661054#comment-16661054 ] Steve Rowe commented on LUCENE-8531: Thanks for the explanation [~jim.ferenczi]. > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > Fix For: 7.6, master (8.0) > > Attachments: LUCENE-8531.patch > > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661015#comment-16661015 ] Jim Ferenczi commented on LUCENE-8531: -- > Can you explain, or point to docs that explain what you mean? I am referring to the javadoc of PhraseQuery#getSlop where it is explained how unordered terms could match: {noformat} * The slop is an edit distance between respective positions of terms as * defined in this {@link PhraseQuery} and the positions of terms in a * document. * * For instance, when searching for {@code "quick fox"}, it is expected that * the difference between the positions of {@code fox} and {@code quick} is 1. * So {@code "a quick brown fox"} would be at an edit distance of 1 since the * difference of the positions of {@code fox} and {@code quick} is 2. * Similarly, {@code "the fox is quick"} would be at an edit distance of 3 * since the difference of the positions of {@code fox} and {@code quick} is -2. * The slop defines the maximum edit distance for a document to match. * * More exact matches are scored higher than sloppier matches, thus search * results are sorted by exactness. */{noformat} This is different than an unordered span near query which does not take the terms query order into account. This is also what is explained in the description of the issue: {noformat} unlike with (Multi)PhraseQuery-s, reordering edits are not allowed, so this is a kind of regression. {noformat} > That said, there surely are potential use cases for the {{inOrder=true}} > behavior, which is supported by {{SpanNearQuery}} but not by > ({{Multi)PhraseQuery}}. Would it be worth opening a new issue to consider > introducing the ability to specifically request construction of > {{SpanNearQuery}} and/or {{inOrder=true}}behavior? The work that went into > building {{SpanNearQuery}} for phrases (commit > [96e8f0a0afe|https://github.com/apache/lucene-solr/commit/96e8f0a0afeb68e2d07ec1dda362894f0b94333d]) > is still useful and relevant, even if the result isn't backward-compatible > for the case where {{slop > 0}}. I think it's something specific that can be handled in a custom QueryBuilder. The API specifically mentions that it builds a phrase so the default implementation should follow the semantic of a PhraseQuery. If we can optimize with a SpanNearQuery instead we need to ensure that it matches the same document than the multi phrase queries approach. That's not the case when slop is greater than 0 so I think we should keep the default behavior as is. You can still override QueryBuilder#analyzeGraphPhrase to apply a different logic on your side if you want. > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > Fix For: 7.6, master (8.0) > > Attachments: LUCENE-8531.patch > > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660913#comment-16660913 ] Steve Rowe commented on LUCENE-8531: +1, thanks [~jim.ferenczi]. bq. (Multi)PhraseQuery-s allows some reordering but the semantic is different from an unordered span near query. Can you explain, or point to docs that explain what you mean? > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > Fix For: 7.6, master (8.0) > > Attachments: LUCENE-8531.patch > > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660880#comment-16660880 ] Michael Gibney commented on LUCENE-8531: I recognize that this was a bug (in that using {{SpanNearQuery}} with {{inOrder=true}} and {{slop > 0}} changed the behavior, rather than simply the implementation, of the built query). That said, there surely are potential use cases for the {{inOrder=true}} behavior, which is supported by {{SpanNearQuery}} but not by ({{Multi)PhraseQuery}}. Would it be worth opening a new issue to consider introducing the ability to specifically request construction of {{SpanNearQuery}} and/or {{inOrder=true}} behavior? The work that went into building {{SpanNearQuery}} for phrases (commit [96e8f0a0afe|https://github.com/apache/lucene-solr/commit/96e8f0a0afeb68e2d07ec1dda362894f0b94333d]) is still useful and relevant, even if the result isn't backward-compatible for the case where {{slop > 0}}. > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > Fix For: 7.6, master (8.0) > > Attachments: LUCENE-8531.patch > > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658732#comment-16658732 ] ASF subversion and git services commented on LUCENE-8531: - Commit e1da5f953731b4e2990e054d09ec0bcb2e5146b8 in lucene-solr's branch refs/heads/jira/http2 from [~jim.ferenczi] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e1da5f9 ] LUCENE-8531: QueryBuilder#analyzeGraphPhrase now creates one phrase query per finite strings in the graph if the slop is greater than 0. Span queries cannot be used in this case because they don't handle slop the same way than phrase queries. > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > Fix For: 7.6, master (8.0) > > Attachments: LUCENE-8531.patch > > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657475#comment-16657475 ] Uwe Schindler commented on LUCENE-8531: --- Thanks Jim! > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > Fix For: 7.6, master (8.0) > > Attachments: LUCENE-8531.patch > > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657274#comment-16657274 ] ASF subversion and git services commented on LUCENE-8531: - Commit 36ce83bc9add02a900e38b396b42c3c729846598 in lucene-solr's branch refs/heads/branch_7x from [~jim.ferenczi] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=36ce83b ] LUCENE-8531: QueryBuilder#analyzeGraphPhrase now creates one phrase query per finite strings in the graph if the slop is greater than 0. Span queries cannot be used in this case because they don't handle slop the same way than phrase queries. > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > Fix For: 7.6, master (8.0) > > Attachments: LUCENE-8531.patch > > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657266#comment-16657266 ] ASF subversion and git services commented on LUCENE-8531: - Commit e1da5f953731b4e2990e054d09ec0bcb2e5146b8 in lucene-solr's branch refs/heads/master from [~jim.ferenczi] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e1da5f9 ] LUCENE-8531: QueryBuilder#analyzeGraphPhrase now creates one phrase query per finite strings in the graph if the slop is greater than 0. Span queries cannot be used in this case because they don't handle slop the same way than phrase queries. > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > Attachments: LUCENE-8531.patch > > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657089#comment-16657089 ] Uwe Schindler commented on LUCENE-8531: --- +1, please do this. I will then take care of the Solr issue. This is not fully related, but the Solr code depends on the structure of Lucene queries produced and then reorders them with lots of instanceof checks. Which is bad spaghetti-code, but that's how it is. I'd like to get a Lucene class that allows you to generate edismax-like queries that parses some text, creates bigram and trigram shingles out of it to allow a "match" query to assign a higher score for hits when you have terms in order and close to each other (put a higher precedence if bigrams or trigrams in your query string are close together in the document). A lot of people use this, but currently it only works with Solr's edismax and whenever you want to use this for other query parser or elasticsearch, you have to reimplement the shingling. > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > Attachments: LUCENE-8531.patch > > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657043#comment-16657043 ] Jim Ferenczi commented on LUCENE-8531: -- Since this is a bug I am planning to commit the proposed patch soon unless there are objections. It will be a bit slower than the current version as [~thetaphi] outlined but I think consistency is more important here. > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > Attachments: LUCENE-8531.patch > > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654901#comment-16654901 ] Jim Ferenczi commented on LUCENE-8531: -- > It would be interesting how Elasticsearch handles this in its matchPhrase > queries. The match query in Elasticsearch uses a QueryBuilder under the hood so we have the same bug. > IMHO, no need for extra param, as this is/was a bug. agreed, we should adapt the query depending on the slop value to ensure consistency > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > Attachments: LUCENE-8531.patch > > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654896#comment-16654896 ] Uwe Schindler commented on LUCENE-8531: --- IMHO, no need for extra param, as this is/was a bug. > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > Attachments: LUCENE-8531.patch > > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654895#comment-16654895 ] Uwe Schindler commented on LUCENE-8531: --- Thanks Jim for the idea. I was following the previous issue on the Solr side and I agree that the behaviour needs to be consistent. But as Jim says, there is no need for some additional "inOrder" parameter as Steve suggested. The analyzePhrase stuff should always behave identical to a pure phrase query, so the inOrder behaviour needs to be consistent with a pure phrase query. If slop is > 0 then in order should be false, no optional parameter needed. This would also fix Solr's bug and Lucene would be consistent. Jim's workaround to prevent using SpanQuery if slop is > 1 is perfectly fine, although might be a bit slower. It would be interesting how Elasticsearch handles this in its matchPhrase queries. > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > Attachments: LUCENE-8531.patch > > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650813#comment-16650813 ] Jim Ferenczi commented on LUCENE-8531: -- (Multi)PhraseQuery-s allows some reordering but the semantic is different from an unordered span near query. I don't think we can respect the slop correctly if we continue to use span queries here. We switched to span queries to avoid searching duplicate terms in multiple phrase queries but I agree that the behavior is not consistent when using a slop. Maybe we could switch to the old method of building one phrase query per path if a slop is used ? This way we could apply the slop to each phrase query independently. This is more costly than the span method but it would be semantically correct. > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
[ https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650768#comment-16650768 ] Steve Rowe commented on LUCENE-8531: CC [~thetaphi] > QueryBuilder hard-codes inOrder=true for generated sloppy span near queries > --- > > Key: LUCENE-8531 > URL: https://issues.apache.org/jira/browse/LUCENE-8531 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Major > > QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in > phraseSlop, but hard-codes inOrder ctor param as true. > Before multi-term synonym support and graph token streams introduced the > possibility of generating SpanNearQuery-s, QueryBuilder generated > (Multi)PhraseQuery-s, which always interpret slop as allowing reordering > edits. Solr's eDismax query parser generates phrase queries when its > pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a > graph-aware synonym filter, SpanNearQuery-s are generated that require > clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits > are not allowed, so this is a kind of regression. See SOLR-12243 for edismax > pf/pf2/pf3 context. (Note that the patch on SOLR-12243 also addresses > another problem that blocks eDismax from generating queries *at all* under > the above-described circumstances.) > I propose adding a new analyzeGraphPhrase() method that allows configuration > of inOrder, which would allow eDismax to specify inOrder=false. The existing > analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so > existing client behavior would remain unchanged. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org