Re: Can the BooleanQuery execution be optimized with same term queries?

2023-09-19 Thread Michael Sokolov
another thing to check beyond whether the correct documents are
matched is whether the correct score is returned. I'm not sure
actually how it works but I can imagine that a query for "red red
wine" would produce a higher score for documents having "red red wine"
than it would for documents having "red wine wine"

On Tue, Sep 19, 2023 at 2:37 AM YouPeng Yang  wrote:
>
> Hi All
>
>During my unemployment time ,the happiest thing is  diving to study the
> Lucene Source Code ,thanks for all the work .
>
>   About the BooleanQuery.I am encounterd by a question about the execution
> of BooleanQuery:although,BooleanQuery#rewrite has done some  works to
> remove duplicate FILTER,SHOULD clauses.however still the same term query
> can been executed the several times.
>
>   I copy the test code in the TestBooleanQuery to approve my assumption.
>
>   Unit Test Code as follows:
>
>
>
> BooleanQuery.Builder qBuilder = new BooleanQuery.Builder();
>
> qBuilder = new BooleanQuery.Builder();
>
> qBuilder.add(new TermQuery(new Term("field", "b")), Occur.*FILTER*);
>
> qBuilder.add(new TermQuery(new Term("field", "a")), Occur.*SHOULD*);
>
> qBuilder.add(new TermQuery(new Term("field", "d")), Occur.*SHOULD*);
>
> BooleanQuery.Builder nestQuery  = new BooleanQuery.Builder();
>
> nestQuery.add(new TermQuery(new Term("field", "b")), Occur.*FILTER*);
>
> nestQuery.add(new TermQuery(new Term("field", "a")), Occur.*SHOULD*);
>
> nestQuery.add(new TermQuery(new Term("field", "d")), Occur.*SHOULD*);
>
> qBuilder.add(nestQuery.build(),Occur.*SHOULD*);
>
> qBuilder.setMinimumNumberShouldMatch(1);
>
> BooleanQuery q = qBuilder.build();
>
> q = qBuilder.build();
>
> assertSameScoresWithoutFilters(searcher, q);
>
>
> In this test, the top boolean query(qBuilder) contains 4 clauses(3 simple
> term-query ,1 nested boolean query that contains the same 3 term-quey).
>
> The underlying execution is that the all the 6 term query were executed(see
> TermQuery.Termweight#getTermsEnum()).
>
> Apparently and theoretically,  the executions can be merged to increase the
> time,right?.
>
>
> So,Is there any possible or necessary  that Lucene merge the execution to
> optimize the query performance, even I know the optimization may be
> difficult.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Can the BooleanQuery execution be optimized with same term queries?

2023-09-19 Thread YouPeng Yang
Hi All

   During my unemployment time ,the happiest thing is  diving to study the
Lucene Source Code ,thanks for all the work .

  About the BooleanQuery.I am encounterd by a question about the execution
of BooleanQuery:although,BooleanQuery#rewrite has done some  works to
remove duplicate FILTER,SHOULD clauses.however still the same term query
can been executed the several times.

  I copy the test code in the TestBooleanQuery to approve my assumption.

  Unit Test Code as follows:



BooleanQuery.Builder qBuilder = new BooleanQuery.Builder();

qBuilder = new BooleanQuery.Builder();

qBuilder.add(new TermQuery(new Term("field", "b")), Occur.*FILTER*);

qBuilder.add(new TermQuery(new Term("field", "a")), Occur.*SHOULD*);

qBuilder.add(new TermQuery(new Term("field", "d")), Occur.*SHOULD*);

BooleanQuery.Builder nestQuery  = new BooleanQuery.Builder();

nestQuery.add(new TermQuery(new Term("field", "b")), Occur.*FILTER*);

nestQuery.add(new TermQuery(new Term("field", "a")), Occur.*SHOULD*);

nestQuery.add(new TermQuery(new Term("field", "d")), Occur.*SHOULD*);

qBuilder.add(nestQuery.build(),Occur.*SHOULD*);

qBuilder.setMinimumNumberShouldMatch(1);

BooleanQuery q = qBuilder.build();

q = qBuilder.build();

assertSameScoresWithoutFilters(searcher, q);


In this test, the top boolean query(qBuilder) contains 4 clauses(3 simple
term-query ,1 nested boolean query that contains the same 3 term-quey).

The underlying execution is that the all the 6 term query were executed(see
TermQuery.Termweight#getTermsEnum()).

Apparently and theoretically,  the executions can be merged to increase the
time,right?.


So,Is there any possible or necessary  that Lucene merge the execution to
optimize the query performance, even I know the optimization may be
difficult.