Re: null Query from MultiFieldQueryParser.getFieldQuery

2016-10-04 Thread Steve Rowe
Great, thanks for reporting back, Oliver!

I’ll go push the change now, including to branch_6_2, so that if we release a 
6.2.2 version, it will be included there.

--
Steve
www.lucidworks.com

> On Oct 4, 2016, at 9:34 AM, Oliver Kaleske  
> wrote:
> 
> Hi Steve,
> 
> thanks for the fix.
> 
> I locally applied the patch on branch_6_2 (because that is closest to my 
> current 6.2.1 dependency) and built Lucene from there.
> Using the outcome in my application, the problem observed there is fixed.
> 
> Best regards,
> Oliver
> 
> -Ursprüngliche Nachricht-
> Von: Steve Rowe [mailto:sar...@gmail.com] 
> Gesendet: Freitag, 30. September 2016 21:48
> An: java-user@lucene.apache.org
> Cc: Oliver Kaleske 
> Betreff: Re: null Query from MultiFieldQueryParser.getFieldQuery
> 
> Hi Oliver,
> 
> Thanks for reporting and for the analysis, this is a bug.
> 
> See <https://issues.apache.org/jira/browse/LUCENE-7472>, where I’ve put up a 
> patch with a fix that treats all non-BooleanQuery queries opaquely (like 
> TermQuery), and adds a test for the SynonymQuery case that fails without the 
> patch and succeeds with it.
> 
> If you could test the patch, that would be great.
> 
> --
> Steve
> www.lucidworks.com
> 
>> On Sep 29, 2016, at 11:24 AM, Adrien Grand  wrote:
>> 
>> I'm not very familiar with this part of the code base so I could easily
>> overlook something. Maybe you can open a JIRA and attach a minimal test
>> case that reproduces the issue?
>> 
>> Le lun. 19 sept. 2016 à 13:48, Oliver Kaleske 
>> a écrit :
>> 
>>> Hi,
>>> 
>>> in updating Lucene from 6.1.0 to 6.2.0 I came across the following:
>>> 
>>> We have a subclass of MultiFieldQueryParser (MFQP) for creating a custom
>>> type of Query, which calls getFieldQuery() on its base class (MFQP).
>>> For each of its search fields, this method has a Query created by calling
>>> getFieldQuery() on QueryParserBase.
>>> Ultimately, we wind up in QueryBuilder's createFieldQuery() method, which
>>> depending on the number of tokens (etc.) decides what type of Query to
>>> return: a TermQuery, BooleanQuery, PhraseQuery, or MultiPhraseQuery.
>>> 
>>> Back in MFQP.getFieldQuery(), a variable maxTerms is determined depending
>>> on the type of Query returned: for a TermQuery or a BooleanQuery, its value
>>> will in general be nonzero, clauses are created, and a non-null Query is
>>> returned.
>>> However, other Query subclasses result in maxTerms=0, an empty list of
>>> clauses, and finally null is returned.
>>> 
>>> To me, this seems like a bug, but I might as well be missing something.
>>> The comment "// happens for stopwords" on the return null statement,
>>> however, seems to suggest that Query types other than TermQuery and
>>> BooleanQuery were not considered properly here.
>>> I should point out that our custom MFQP subclass so far does some rather
>>> unsophisticated tokenization before calling getFieldQuery() on each token,
>>> so characters like '*' may still slip through. So perhaps with proper
>>> tokenization, it is guaranteed that only TermQuery and BooleanQuery can
>>> come out of the chain of getFieldQuery() calls, and not handling
>>> (Multi)PhraseQuery in MFQP.getFieldQuery() can never cause trouble?
>>> 
>>> The code in MFQP.getFieldQuery dates back to
>>> LUCENE-2605: Add classic QueryParser option setSplitOnWhitespace() to
>>> control whether to split on whitespace prior to text analysis.  Default
>>> behavior remains unchanged: split-on-whitespace=true.
>>> (06 Jul 2016), when it was substantially expanded.
>>> 
>>> Best regards,
>>> Oliver
>>> 
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>> 
>>> 
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: null Query from MultiFieldQueryParser.getFieldQuery

2016-10-04 Thread Oliver Kaleske
Hi Steve,

thanks for the fix.

I locally applied the patch on branch_6_2 (because that is closest to my 
current 6.2.1 dependency) and built Lucene from there.
Using the outcome in my application, the problem observed there is fixed.

Best regards,
Oliver

-Ursprüngliche Nachricht-
Von: Steve Rowe [mailto:sar...@gmail.com] 
Gesendet: Freitag, 30. September 2016 21:48
An: java-user@lucene.apache.org
Cc: Oliver Kaleske 
Betreff: Re: null Query from MultiFieldQueryParser.getFieldQuery

Hi Oliver,

Thanks for reporting and for the analysis, this is a bug.

See <https://issues.apache.org/jira/browse/LUCENE-7472>, where I’ve put up a 
patch with a fix that treats all non-BooleanQuery queries opaquely (like 
TermQuery), and adds a test for the SynonymQuery case that fails without the 
patch and succeeds with it.

If you could test the patch, that would be great.

--
Steve
www.lucidworks.com

> On Sep 29, 2016, at 11:24 AM, Adrien Grand  wrote:
> 
> I'm not very familiar with this part of the code base so I could easily
> overlook something. Maybe you can open a JIRA and attach a minimal test
> case that reproduces the issue?
> 
> Le lun. 19 sept. 2016 à 13:48, Oliver Kaleske 
> a écrit :
> 
>> Hi,
>> 
>> in updating Lucene from 6.1.0 to 6.2.0 I came across the following:
>> 
>> We have a subclass of MultiFieldQueryParser (MFQP) for creating a custom
>> type of Query, which calls getFieldQuery() on its base class (MFQP).
>> For each of its search fields, this method has a Query created by calling
>> getFieldQuery() on QueryParserBase.
>> Ultimately, we wind up in QueryBuilder's createFieldQuery() method, which
>> depending on the number of tokens (etc.) decides what type of Query to
>> return: a TermQuery, BooleanQuery, PhraseQuery, or MultiPhraseQuery.
>> 
>> Back in MFQP.getFieldQuery(), a variable maxTerms is determined depending
>> on the type of Query returned: for a TermQuery or a BooleanQuery, its value
>> will in general be nonzero, clauses are created, and a non-null Query is
>> returned.
>> However, other Query subclasses result in maxTerms=0, an empty list of
>> clauses, and finally null is returned.
>> 
>> To me, this seems like a bug, but I might as well be missing something.
>> The comment "// happens for stopwords" on the return null statement,
>> however, seems to suggest that Query types other than TermQuery and
>> BooleanQuery were not considered properly here.
>> I should point out that our custom MFQP subclass so far does some rather
>> unsophisticated tokenization before calling getFieldQuery() on each token,
>> so characters like '*' may still slip through. So perhaps with proper
>> tokenization, it is guaranteed that only TermQuery and BooleanQuery can
>> come out of the chain of getFieldQuery() calls, and not handling
>> (Multi)PhraseQuery in MFQP.getFieldQuery() can never cause trouble?
>> 
>> The code in MFQP.getFieldQuery dates back to
>> LUCENE-2605: Add classic QueryParser option setSplitOnWhitespace() to
>> control whether to split on whitespace prior to text analysis.  Default
>> behavior remains unchanged: split-on-whitespace=true.
>> (06 Jul 2016), when it was substantially expanded.
>> 
>> Best regards,
>> Oliver
>> 
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 



Re: null Query from MultiFieldQueryParser.getFieldQuery

2016-09-30 Thread Steve Rowe
Hi Oliver,

Thanks for reporting and for the analysis, this is a bug.

See , where I’ve put up a 
patch with a fix that treats all non-BooleanQuery queries opaquely (like 
TermQuery), and adds a test for the SynonymQuery case that fails without the 
patch and succeeds with it.

If you could test the patch, that would be great.

--
Steve
www.lucidworks.com

> On Sep 29, 2016, at 11:24 AM, Adrien Grand  wrote:
> 
> I'm not very familiar with this part of the code base so I could easily
> overlook something. Maybe you can open a JIRA and attach a minimal test
> case that reproduces the issue?
> 
> Le lun. 19 sept. 2016 à 13:48, Oliver Kaleske 
> a écrit :
> 
>> Hi,
>> 
>> in updating Lucene from 6.1.0 to 6.2.0 I came across the following:
>> 
>> We have a subclass of MultiFieldQueryParser (MFQP) for creating a custom
>> type of Query, which calls getFieldQuery() on its base class (MFQP).
>> For each of its search fields, this method has a Query created by calling
>> getFieldQuery() on QueryParserBase.
>> Ultimately, we wind up in QueryBuilder's createFieldQuery() method, which
>> depending on the number of tokens (etc.) decides what type of Query to
>> return: a TermQuery, BooleanQuery, PhraseQuery, or MultiPhraseQuery.
>> 
>> Back in MFQP.getFieldQuery(), a variable maxTerms is determined depending
>> on the type of Query returned: for a TermQuery or a BooleanQuery, its value
>> will in general be nonzero, clauses are created, and a non-null Query is
>> returned.
>> However, other Query subclasses result in maxTerms=0, an empty list of
>> clauses, and finally null is returned.
>> 
>> To me, this seems like a bug, but I might as well be missing something.
>> The comment "// happens for stopwords" on the return null statement,
>> however, seems to suggest that Query types other than TermQuery and
>> BooleanQuery were not considered properly here.
>> I should point out that our custom MFQP subclass so far does some rather
>> unsophisticated tokenization before calling getFieldQuery() on each token,
>> so characters like '*' may still slip through. So perhaps with proper
>> tokenization, it is guaranteed that only TermQuery and BooleanQuery can
>> come out of the chain of getFieldQuery() calls, and not handling
>> (Multi)PhraseQuery in MFQP.getFieldQuery() can never cause trouble?
>> 
>> The code in MFQP.getFieldQuery dates back to
>> LUCENE-2605: Add classic QueryParser option setSplitOnWhitespace() to
>> control whether to split on whitespace prior to text analysis.  Default
>> behavior remains unchanged: split-on-whitespace=true.
>> (06 Jul 2016), when it was substantially expanded.
>> 
>> Best regards,
>> Oliver
>> 
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: null Query from MultiFieldQueryParser.getFieldQuery

2016-09-29 Thread Adrien Grand
I'm not very familiar with this part of the code base so I could easily
overlook something. Maybe you can open a JIRA and attach a minimal test
case that reproduces the issue?

Le lun. 19 sept. 2016 à 13:48, Oliver Kaleske 
a écrit :

> Hi,
>
> in updating Lucene from 6.1.0 to 6.2.0 I came across the following:
>
> We have a subclass of MultiFieldQueryParser (MFQP) for creating a custom
> type of Query, which calls getFieldQuery() on its base class (MFQP).
> For each of its search fields, this method has a Query created by calling
> getFieldQuery() on QueryParserBase.
> Ultimately, we wind up in QueryBuilder's createFieldQuery() method, which
> depending on the number of tokens (etc.) decides what type of Query to
> return: a TermQuery, BooleanQuery, PhraseQuery, or MultiPhraseQuery.
>
> Back in MFQP.getFieldQuery(), a variable maxTerms is determined depending
> on the type of Query returned: for a TermQuery or a BooleanQuery, its value
> will in general be nonzero, clauses are created, and a non-null Query is
> returned.
> However, other Query subclasses result in maxTerms=0, an empty list of
> clauses, and finally null is returned.
>
> To me, this seems like a bug, but I might as well be missing something.
> The comment "// happens for stopwords" on the return null statement,
> however, seems to suggest that Query types other than TermQuery and
> BooleanQuery were not considered properly here.
> I should point out that our custom MFQP subclass so far does some rather
> unsophisticated tokenization before calling getFieldQuery() on each token,
> so characters like '*' may still slip through. So perhaps with proper
> tokenization, it is guaranteed that only TermQuery and BooleanQuery can
> come out of the chain of getFieldQuery() calls, and not handling
> (Multi)PhraseQuery in MFQP.getFieldQuery() can never cause trouble?
>
> The code in MFQP.getFieldQuery dates back to
> LUCENE-2605: Add classic QueryParser option setSplitOnWhitespace() to
> control whether to split on whitespace prior to text analysis.  Default
> behavior remains unchanged: split-on-whitespace=true.
> (06 Jul 2016), when it was substantially expanded.
>
> Best regards,
> Oliver
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


null Query from MultiFieldQueryParser.getFieldQuery

2016-09-19 Thread Oliver Kaleske
Hi,

in updating Lucene from 6.1.0 to 6.2.0 I came across the following:

We have a subclass of MultiFieldQueryParser (MFQP) for creating a custom type 
of Query, which calls getFieldQuery() on its base class (MFQP).
For each of its search fields, this method has a Query created by calling 
getFieldQuery() on QueryParserBase.
Ultimately, we wind up in QueryBuilder's createFieldQuery() method, which 
depending on the number of tokens (etc.) decides what type of Query to return: 
a TermQuery, BooleanQuery, PhraseQuery, or MultiPhraseQuery.

Back in MFQP.getFieldQuery(), a variable maxTerms is determined depending on 
the type of Query returned: for a TermQuery or a BooleanQuery, its value will 
in general be nonzero, clauses are created, and a non-null Query is returned.
However, other Query subclasses result in maxTerms=0, an empty list of clauses, 
and finally null is returned.

To me, this seems like a bug, but I might as well be missing something. The 
comment "// happens for stopwords" on the return null statement, however, seems 
to suggest that Query types other than TermQuery and BooleanQuery were not 
considered properly here.
I should point out that our custom MFQP subclass so far does some rather 
unsophisticated tokenization before calling getFieldQuery() on each token, so 
characters like '*' may still slip through. So perhaps with proper 
tokenization, it is guaranteed that only TermQuery and BooleanQuery can come 
out of the chain of getFieldQuery() calls, and not handling (Multi)PhraseQuery 
in MFQP.getFieldQuery() can never cause trouble?

The code in MFQP.getFieldQuery dates back to
LUCENE-2605: Add classic QueryParser option setSplitOnWhitespace() to control 
whether to split on whitespace prior to text analysis.  Default behavior 
remains unchanged: split-on-whitespace=true.
(06 Jul 2016), when it was substantially expanded.

Best regards,
Oliver

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org