[jira] [Commented] (LUCENE-8810) Flattening of nested disjunctions does not take into account number of clause limitation of builder
[ https://issues.apache.org/jira/browse/LUCENE-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885240#comment-16885240 ] ASF subversion and git services commented on LUCENE-8810: - Commit 8739d5ebf7e6c5355d5727327bf7f2200d66c0a5 in lucene-solr's branch refs/heads/branch_8_2 from Atri Sharma [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8739d5e ] LUCENE-8810: Honor MaxClausesCount in BooleanQuery (#787) During Flattening, BooleanQuery will always try to flatten nested clauses during rewrite. However, this can cause the maximum number of clauses to be violated by the new query. This commit disables flattening in the specific case. > Flattening of nested disjunctions does not take into account number of clause > limitation of builder > --- > > Key: LUCENE-8810 > URL: https://issues.apache.org/jira/browse/LUCENE-8810 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.0 >Reporter: Mickaël Sauvée >Priority: Minor > Fix For: 8.1.1 > > Attachments: LUCENE-8810.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > In org.apache.lucene.search.BooleanQuery, at the end of the function > rewrite(IndexReader reader), the query is rewritten to flatten nested > disjunctions. > This does not take into account the limitation on the number of clauses in a > builder (1024). > In some circumstances, this limite can be reached, hence an exception is > thrown. > Here is a unit test that highlight this. > {code:java} > public void testFlattenInnerDisjunctionsWithMoreThan1024Terms() throws > IOException { > IndexSearcher searcher = newSearcher(new MultiReader()); > BooleanQuery.Builder builder1024 = new BooleanQuery.Builder(); > for(int i = 0; i < 1024; i++) { > builder1024.add(new TermQuery(new Term("foo", "bar-" + i)), > Occur.SHOULD); > } > Query inner = builder1024.build(); > Query query = new BooleanQuery.Builder() > .add(inner, Occur.SHOULD) > .add(new TermQuery(new Term("foo", "baz")), Occur.SHOULD) > .build(); > searcher.rewrite(query); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8810) Flattening of nested disjunctions does not take into account number of clause limitation of builder
[ https://issues.apache.org/jira/browse/LUCENE-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885235#comment-16885235 ] ASF subversion and git services commented on LUCENE-8810: - Commit a94093102edabed5a6722b45afcbc6789426194a in lucene-solr's branch refs/heads/branch_8x from Atri Sharma [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a940931 ] LUCENE-8810: Honor MaxClausesCount in BooleanQuery (#787) During Flattening, BooleanQuery will always try to flatten nested clauses during rewrite. However, this can cause the maximum number of clauses to be violated by the new query. This commit disables flattening in the specific case. > Flattening of nested disjunctions does not take into account number of clause > limitation of builder > --- > > Key: LUCENE-8810 > URL: https://issues.apache.org/jira/browse/LUCENE-8810 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.0 >Reporter: Mickaël Sauvée >Priority: Minor > Fix For: 8.1.1 > > Attachments: LUCENE-8810.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > In org.apache.lucene.search.BooleanQuery, at the end of the function > rewrite(IndexReader reader), the query is rewritten to flatten nested > disjunctions. > This does not take into account the limitation on the number of clauses in a > builder (1024). > In some circumstances, this limite can be reached, hence an exception is > thrown. > Here is a unit test that highlight this. > {code:java} > public void testFlattenInnerDisjunctionsWithMoreThan1024Terms() throws > IOException { > IndexSearcher searcher = newSearcher(new MultiReader()); > BooleanQuery.Builder builder1024 = new BooleanQuery.Builder(); > for(int i = 0; i < 1024; i++) { > builder1024.add(new TermQuery(new Term("foo", "bar-" + i)), > Occur.SHOULD); > } > Query inner = builder1024.build(); > Query query = new BooleanQuery.Builder() > .add(inner, Occur.SHOULD) > .add(new TermQuery(new Term("foo", "baz")), Occur.SHOULD) > .build(); > searcher.rewrite(query); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8810) Flattening of nested disjunctions does not take into account number of clause limitation of builder
[ https://issues.apache.org/jira/browse/LUCENE-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847352#comment-16847352 ] Atri Sharma commented on LUCENE-8810: - ++ agreed. Will post a patch on https://issues.apache.org/jira/browse/LUCENE-8811 with the IndexSeacher approach > Flattening of nested disjunctions does not take into account number of clause > limitation of builder > --- > > Key: LUCENE-8810 > URL: https://issues.apache.org/jira/browse/LUCENE-8810 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.0 >Reporter: Mickaël Sauvée >Priority: Minor > Fix For: 8.1.1 > > Attachments: LUCENE-8810.patch > > > In org.apache.lucene.search.BooleanQuery, at the end of the function > rewrite(IndexReader reader), the query is rewritten to flatten nested > disjunctions. > This does not take into account the limitation on the number of clauses in a > builder (1024). > In some circumstances, this limite can be reached, hence an exception is > thrown. > Here is a unit test that highlight this. > {code:java} > public void testFlattenInnerDisjunctionsWithMoreThan1024Terms() throws > IOException { > IndexSearcher searcher = newSearcher(new MultiReader()); > BooleanQuery.Builder builder1024 = new BooleanQuery.Builder(); > for(int i = 0; i < 1024; i++) { > builder1024.add(new TermQuery(new Term("foo", "bar-" + i)), > Occur.SHOULD); > } > Query inner = builder1024.build(); > Query query = new BooleanQuery.Builder() > .add(inner, Occur.SHOULD) > .add(new TermQuery(new Term("foo", "baz")), Occur.SHOULD) > .build(); > searcher.rewrite(query); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8810) Flattening of nested disjunctions does not take into account number of clause limitation of builder
[ https://issues.apache.org/jira/browse/LUCENE-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847343#comment-16847343 ] Adrien Grand commented on LUCENE-8810: -- Doing instanceof checks feels too fragile to me, this won't work if the BooleanQuery is wrapped under a ConstantScoreQuery or a BoostQuery. We could consider using the visitor API instead, but this would make BooleanQuery construction run in quadratic time of the depth of the query (if you have a boolean query BQ1 that wraps BQ2, which itself wraps BQ3, the clause count of BQ2 will be checked by BQ1 and BQ2 and the clause count of BQ3 will be checked by BQ1, BQ2 and BQ3, etc.), which is why I thought of IndexSearcher, which is the place where we would have access to the top-level query. > Flattening of nested disjunctions does not take into account number of clause > limitation of builder > --- > > Key: LUCENE-8810 > URL: https://issues.apache.org/jira/browse/LUCENE-8810 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.0 >Reporter: Mickaël Sauvée >Priority: Minor > Fix For: 8.1.1 > > Attachments: LUCENE-8810.patch > > > In org.apache.lucene.search.BooleanQuery, at the end of the function > rewrite(IndexReader reader), the query is rewritten to flatten nested > disjunctions. > This does not take into account the limitation on the number of clauses in a > builder (1024). > In some circumstances, this limite can be reached, hence an exception is > thrown. > Here is a unit test that highlight this. > {code:java} > public void testFlattenInnerDisjunctionsWithMoreThan1024Terms() throws > IOException { > IndexSearcher searcher = newSearcher(new MultiReader()); > BooleanQuery.Builder builder1024 = new BooleanQuery.Builder(); > for(int i = 0; i < 1024; i++) { > builder1024.add(new TermQuery(new Term("foo", "bar-" + i)), > Occur.SHOULD); > } > Query inner = builder1024.build(); > Query query = new BooleanQuery.Builder() > .add(inner, Occur.SHOULD) > .add(new TermQuery(new Term("foo", "baz")), Occur.SHOULD) > .build(); > searcher.rewrite(query); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8810) Flattening of nested disjunctions does not take into account number of clause limitation of builder
[ https://issues.apache.org/jira/browse/LUCENE-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847332#comment-16847332 ] Atri Sharma commented on LUCENE-8810: - [~msauvee] I understand the confusion. Ideally, we should be raising the exception at the same place as would happen if you built a single Builder (no nested clauses) and put >1024 clauses. I put in a WIP patch to handle this more gracefully. [~jpountz] would this suffice? [^LUCENE-8810.patch] > Flattening of nested disjunctions does not take into account number of clause > limitation of builder > --- > > Key: LUCENE-8810 > URL: https://issues.apache.org/jira/browse/LUCENE-8810 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.0 >Reporter: Mickaël Sauvée >Priority: Minor > Fix For: 8.1.1 > > Attachments: LUCENE-8810.patch > > > In org.apache.lucene.search.BooleanQuery, at the end of the function > rewrite(IndexReader reader), the query is rewritten to flatten nested > disjunctions. > This does not take into account the limitation on the number of clauses in a > builder (1024). > In some circumstances, this limite can be reached, hence an exception is > thrown. > Here is a unit test that highlight this. > {code:java} > public void testFlattenInnerDisjunctionsWithMoreThan1024Terms() throws > IOException { > IndexSearcher searcher = newSearcher(new MultiReader()); > BooleanQuery.Builder builder1024 = new BooleanQuery.Builder(); > for(int i = 0; i < 1024; i++) { > builder1024.add(new TermQuery(new Term("foo", "bar-" + i)), > Occur.SHOULD); > } > Query inner = builder1024.build(); > Query query = new BooleanQuery.Builder() > .add(inner, Occur.SHOULD) > .add(new TermQuery(new Term("foo", "baz")), Occur.SHOULD) > .build(); > searcher.rewrite(query); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8810) Flattening of nested disjunctions does not take into account number of clause limitation of builder
[ https://issues.apache.org/jira/browse/LUCENE-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847320#comment-16847320 ] Adrien Grand commented on LUCENE-8810: -- I'm expecting that this would mostly be an issue for users who use inner boolean queries as a way to work around the maximum clause count. That said I understand how this change can be surprising, and it should be easy enough to check the clause count in the rewrite rule so I'd be ok with doing this. Would you like to work on a patch? I opened LUCENE-8811 to maybe re-think the way that we check the number of clauses of queries in a more consistent way. bq. I do not know what is "block-max WAND" (line 479 of BooleanQuery). It is an optimized way to retrieve top hits of disjunctions (boolean queries with only SHOULD clauses) by decreasing score. It works by ignoring low-scoring clauses, and works better when disjunctions are inlined since this gives more information to the algorithm. > Flattening of nested disjunctions does not take into account number of clause > limitation of builder > --- > > Key: LUCENE-8810 > URL: https://issues.apache.org/jira/browse/LUCENE-8810 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.0 >Reporter: Mickaël Sauvée >Priority: Minor > Fix For: 8.1.1 > > > In org.apache.lucene.search.BooleanQuery, at the end of the function > rewrite(IndexReader reader), the query is rewritten to flatten nested > disjunctions. > This does not take into account the limitation on the number of clauses in a > builder (1024). > In some circumstances, this limite can be reached, hence an exception is > thrown. > Here is a unit test that highlight this. > {code:java} > public void testFlattenInnerDisjunctionsWithMoreThan1024Terms() throws > IOException { > IndexSearcher searcher = newSearcher(new MultiReader()); > BooleanQuery.Builder builder1024 = new BooleanQuery.Builder(); > for(int i = 0; i < 1024; i++) { > builder1024.add(new TermQuery(new Term("foo", "bar-" + i)), > Occur.SHOULD); > } > Query inner = builder1024.build(); > Query query = new BooleanQuery.Builder() > .add(inner, Occur.SHOULD) > .add(new TermQuery(new Term("foo", "baz")), Occur.SHOULD) > .build(); > searcher.rewrite(query); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8810) Flattening of nested disjunctions does not take into account number of clause limitation of builder
[ https://issues.apache.org/jira/browse/LUCENE-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847303#comment-16847303 ] Mickaël Sauvée commented on LUCENE-8810: [~atris] In my test, the inner builder is not erroing (if I interpret well your comment): this is the rewrite that raise an exception. The rewrite will flatten the 1025 clauses. Maybe this is the expected behavior, but in 7.5 the rewrite was ok (without flattening of course). So this is a change in behavior. What I would have expected, is flattening to a max of 1024 clauses or do not flatten if not possible, but not an exception raised. I may be wrong as I do not know what is "block-max WAND" (line 479 of BooleanQuery). > Flattening of nested disjunctions does not take into account number of clause > limitation of builder > --- > > Key: LUCENE-8810 > URL: https://issues.apache.org/jira/browse/LUCENE-8810 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.0 >Reporter: Mickaël Sauvée >Priority: Minor > Fix For: 8.1.1 > > > In org.apache.lucene.search.BooleanQuery, at the end of the function > rewrite(IndexReader reader), the query is rewritten to flatten nested > disjunctions. > This does not take into account the limitation on the number of clauses in a > builder (1024). > In some circumstances, this limite can be reached, hence an exception is > thrown. > Here is a unit test that highlight this. > {code:java} > public void testFlattenInnerDisjunctionsWithMoreThan1024Terms() throws > IOException { > IndexSearcher searcher = newSearcher(new MultiReader()); > BooleanQuery.Builder builder1024 = new BooleanQuery.Builder(); > for(int i = 0; i < 1024; i++) { > builder1024.add(new TermQuery(new Term("foo", "bar-" + i)), > Occur.SHOULD); > } > Query inner = builder1024.build(); > Query query = new BooleanQuery.Builder() > .add(inner, Occur.SHOULD) > .add(new TermQuery(new Term("foo", "baz")), Occur.SHOULD) > .build(); > searcher.rewrite(query); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8810) Flattening of nested disjunctions does not take into account number of clause limitation of builder
[ https://issues.apache.org/jira/browse/LUCENE-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847247#comment-16847247 ] Atri Sharma commented on LUCENE-8810: - [~msauvee] I am not able to understand what is the actual issue that you are seeing here. Given that a single Builder cannot have more than 1024 clauses, your inner builder is erroring out which is the expected behaviour. I am curious to think of a parallel implication here though. What is the expected semantic when a nested disjunction which will be flattened exceeds the total clause count, when combined with the parent boolean query? I.e. if the parent query had 601 clauses, where one of the clauses was a disjunction containing another 600 clauses, will this query be treated as a valid query (it is treated as valid today). Is this the correct invariant? > Flattening of nested disjunctions does not take into account number of clause > limitation of builder > --- > > Key: LUCENE-8810 > URL: https://issues.apache.org/jira/browse/LUCENE-8810 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.0 >Reporter: Mickaël Sauvée >Priority: Minor > Fix For: 8.1.1 > > > In org.apache.lucene.search.BooleanQuery, at the end of the function > rewrite(IndexReader reader), the query is rewritten to flatten nested > disjunctions. > This does not take into account the limitation on the number of clauses in a > builder (1024). > In some circumstances, this limite can be reached, hence an exception is > thrown. > Here is a unit test that highlight this. > {code:java} > public void testFlattenInnerDisjunctionsWithMoreThan1024Terms() throws > IOException { > IndexSearcher searcher = newSearcher(new MultiReader()); > BooleanQuery.Builder builder1024 = new BooleanQuery.Builder(); > for(int i = 0; i < 1024; i++) { > builder1024.add(new TermQuery(new Term("foo", "bar-" + i)), > Occur.SHOULD); > } > Query inner = builder1024.build(); > Query query = new BooleanQuery.Builder() > .add(inner, Occur.SHOULD) > .add(new TermQuery(new Term("foo", "baz")), Occur.SHOULD) > .build(); > searcher.rewrite(query); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org