Re: autoGeneratePhraseQueries not working

2019-04-16 Thread Alexandre Rafalovitch
Ah oops. Did not realize the original text was missing spaces. Looked
like so many questions that did, I did not recheck the search query.

Go with Erick's explanation for this specific case. And keep my in
mind for input with spaces.

Regards,
   Alex.

On Tue, 16 Apr 2019 at 17:48, Erick Erickson  wrote:
>
> The issue isn’t SoW. What’s happening here is that the query _parser_ passes 
> my25word through as a single token, then WordDelimiterGraphFilterFactory 
> splits it up on number/letter changes after SoW is out of the picture. The 
> admin/analysis page will show you how this works.
>
> By fiddling with the settings in WordDelimiterGraphFilterFactory, you can get 
> close to auto phrase queries, in particular catenateall. But it’s not quite 
> the same thing under any circumstances as phrases.
>
> Best,
> Erick
>
> > On Apr 16, 2019, at 4:31 AM, Leonardo Francalanci 
> >  wrote:
> >
> > Thank you for the reply.
> > I'm using eDisMax, does it use the same parser as the Standard Query Parser 
> > then?
> > I think this behavior should be documented somehow... it's very confusing 
> > and to be honest I don't even remember how I got to the sow parameter... 
> > and I'm not sure what that means for all other queries I have
> >
> >Il martedì 16 aprile 2019, 13:09:26 CEST, Alexandre Rafalovitch 
> >  ha scritto:
> >
> > The issue is that the Standard Query Parser does pre-processing of the
> > query and splits it on whitespace beforehand (to deal with all the
> > special syntax). So, if you don't use quoted phrases then by the time
> > the field specific query analyzer chain kicks in, the text is already
> > pre-split and the analyzer only sees one (pre space-separated) token
> > at a time. So, the autoGeneratePhraseQueries does not work then. If
> > you use different parsers that send whole text in (e.g. FieldQParser),
> > then - I think - it will work.
> >
> > Or, like you discovered, sow=true tells the Standard Query Parser to
> > send it all together as well.
> >
> > It is a bit of a messy part of Solr, because the Admin Analysis page
> > sends the text to the query analyzer without splitting (it does not
> > use any Query Parser). So, that adds to the confusion.
> >
> > Regards,
> >   Alex.
> >
> > On Tue, 16 Apr 2019 at 10:53, Leonardo Francalanci
> >  wrote:
> >>
> >>   To add some information: using "sow=true" it seems to work.But I don't 
> >> understand why with "sow=false" it wouldn't work (can't find anything in 
> >> the docs about sow interaction with autoGeneratePhraseQueries); and the 
> >> implication of setting saw=true.
> >> I've found this:[SOLR-9185] Solr's edismax and "Lucene"/standard query 
> >> parsers should optionally not split on whitespace before sending terms to 
> >> analysis - ASF JIRA
> >>
> >> |
> >> |
> >> |  |
> >> [SOLR-9185] Solr's edismax and "Lucene"/standard query parsers should op...
> >>
> >>
> >>   |
> >>
> >>   |
> >>
> >>   |
> >>
> >>
> >> But it's very low level and I can't find any doc more "user friendly"
> >>
> >> Il martedì 16 aprile 2019, 09:00:08 CEST, Leonardo Francalanci 
> >>  ha scritto:
> >>
> >>   Hi,
> >>
> >> I'm using Solr 8.0.0  I can't get autoGeneratePhraseQueries to work (also 
> >> tried with 7.7.1 and same result):
> >>
> >> debug":{
> >> "rawquerystring":"TROUBLESHOOT:my25word",
> >> "querystring":"TROUBLESHOOT:my25word",
> >> "parsedquery":"TROUBLESHOOT:my TROUBLESHOOT:25 TROUBLESHOOT:word",
> >> "parsedquery_toString":"TROUBLESHOOT:my TROUBLESHOOT:25 
> >> TROUBLESHOOT:word",
> >>
> >> I expected something like
> >>
> >> "parsedquery":"TROUBLESHOOT:"my 25 word"
> >> Why isn't autoGeneratePhraseQueries generating a quoted string argument 
> >> when I query???
> >>
> >>
> >> This is my configuration:
> >>
> >>>> indexed="true"  stored="true"/>
> >>  >> positionIncrementGap="100" autoGeneratePhraseQueries="true">
> >>   
> >> 
> >> 
> >> 
> >>  >> ignoreCase="true"
> >> words="lang/stopwords_en.txt"
> >> />
> >>  >> generateWordParts="1" generateNumberParts="1" catenateWords="1" 
> >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >> 
> >>  >> protected="protwords.txt"/>
> >> 
> >> 
> >>   
> >>   
> >> 
> >>  >> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >>  >> ignoreCase="true"
> >> words="lang/stopwords_en.txt"
> >> />
> >>  >> generateWordParts="1" generateNumberParts="1" catenateWords="0" 
> >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> >> 
> >>  >> protected="protwords.txt"/>
> >> 
> >>   
> >> 
> >>  >> stored="true" multiValued="true" omitNorms="true"/>
> >>
>


Re: autoGeneratePhraseQueries not working

2019-04-16 Thread Erick Erickson
The issue isn’t SoW. What’s happening here is that the query _parser_ passes 
my25word through as a single token, then WordDelimiterGraphFilterFactory splits 
it up on number/letter changes after SoW is out of the picture. The 
admin/analysis page will show you how this works.

By fiddling with the settings in WordDelimiterGraphFilterFactory, you can get 
close to auto phrase queries, in particular catenateall. But it’s not quite the 
same thing under any circumstances as phrases.

Best,
Erick

> On Apr 16, 2019, at 4:31 AM, Leonardo Francalanci 
>  wrote:
> 
> Thank you for the reply.
> I'm using eDisMax, does it use the same parser as the Standard Query Parser 
> then?
> I think this behavior should be documented somehow... it's very confusing and 
> to be honest I don't even remember how I got to the sow parameter... and I'm 
> not sure what that means for all other queries I have
> 
>Il martedì 16 aprile 2019, 13:09:26 CEST, Alexandre Rafalovitch 
>  ha scritto:  
> 
> The issue is that the Standard Query Parser does pre-processing of the
> query and splits it on whitespace beforehand (to deal with all the
> special syntax). So, if you don't use quoted phrases then by the time
> the field specific query analyzer chain kicks in, the text is already
> pre-split and the analyzer only sees one (pre space-separated) token
> at a time. So, the autoGeneratePhraseQueries does not work then. If
> you use different parsers that send whole text in (e.g. FieldQParser),
> then - I think - it will work.
> 
> Or, like you discovered, sow=true tells the Standard Query Parser to
> send it all together as well.
> 
> It is a bit of a messy part of Solr, because the Admin Analysis page
> sends the text to the query analyzer without splitting (it does not
> use any Query Parser). So, that adds to the confusion.
> 
> Regards,
>   Alex.
> 
> On Tue, 16 Apr 2019 at 10:53, Leonardo Francalanci
>  wrote:
>> 
>>   To add some information: using "sow=true" it seems to work.But I don't 
>> understand why with "sow=false" it wouldn't work (can't find anything in the 
>> docs about sow interaction with autoGeneratePhraseQueries); and the 
>> implication of setting saw=true.
>> I've found this:[SOLR-9185] Solr's edismax and "Lucene"/standard query 
>> parsers should optionally not split on whitespace before sending terms to 
>> analysis - ASF JIRA
>> 
>> |
>> |
>> |  |
>> [SOLR-9185] Solr's edismax and "Lucene"/standard query parsers should op...
>> 
>> 
>>   |
>> 
>>   |
>> 
>>   |
>> 
>> 
>> But it's very low level and I can't find any doc more "user friendly"
>> 
>> Il martedì 16 aprile 2019, 09:00:08 CEST, Leonardo Francalanci 
>>  ha scritto:
>> 
>>   Hi,
>> 
>> I'm using Solr 8.0.0  I can't get autoGeneratePhraseQueries to work (also 
>> tried with 7.7.1 and same result):
>> 
>> debug":{
>> "rawquerystring":"TROUBLESHOOT:my25word",
>> "querystring":"TROUBLESHOOT:my25word",
>> "parsedquery":"TROUBLESHOOT:my TROUBLESHOOT:25 TROUBLESHOOT:word",
>> "parsedquery_toString":"TROUBLESHOOT:my TROUBLESHOOT:25 
>> TROUBLESHOOT:word",
>> 
>> I expected something like
>> 
>> "parsedquery":"TROUBLESHOOT:"my 25 word"
>> Why isn't autoGeneratePhraseQueries generating a quoted string argument when 
>> I query???
>> 
>> 
>> This is my configuration:
>> 
>>   > indexed="true"  stored="true"/>
>> > positionIncrementGap="100" autoGeneratePhraseQueries="true">
>>   
>> 
>> 
>> 
>> > ignoreCase="true"
>> words="lang/stopwords_en.txt"
>> />
>> > generateWordParts="1" generateNumberParts="1" catenateWords="1" 
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>> 
>> > protected="protwords.txt"/>
>> 
>> 
>>   
>>   
>> 
>> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>> > ignoreCase="true"
>> words="lang/stopwords_en.txt"
>> />
>> > generateWordParts="1" generateNumberParts="1" catenateWords="0" 
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>> 
>> > protected="protwords.txt"/>
>> 
>>   
>> 
>> > stored="true" multiValued="true" omitNorms="true"/>
>> 



Re: autoGeneratePhraseQueries not working

2019-04-16 Thread Leonardo Francalanci
 Thank you for the reply.
I'm using eDisMax, does it use the same parser as the Standard Query Parser 
then?
I think this behavior should be documented somehow... it's very confusing and 
to be honest I don't even remember how I got to the sow parameter... and I'm 
not sure what that means for all other queries I have

Il martedì 16 aprile 2019, 13:09:26 CEST, Alexandre Rafalovitch 
 ha scritto:  
 
 The issue is that the Standard Query Parser does pre-processing of the
query and splits it on whitespace beforehand (to deal with all the
special syntax). So, if you don't use quoted phrases then by the time
the field specific query analyzer chain kicks in, the text is already
pre-split and the analyzer only sees one (pre space-separated) token
at a time. So, the autoGeneratePhraseQueries does not work then. If
you use different parsers that send whole text in (e.g. FieldQParser),
then - I think - it will work.

Or, like you discovered, sow=true tells the Standard Query Parser to
send it all together as well.

It is a bit of a messy part of Solr, because the Admin Analysis page
sends the text to the query analyzer without splitting (it does not
use any Query Parser). So, that adds to the confusion.

Regards,
  Alex.

On Tue, 16 Apr 2019 at 10:53, Leonardo Francalanci
 wrote:
>
>  To add some information: using "sow=true" it seems to work.But I don't 
>understand why with "sow=false" it wouldn't work (can't find anything in the 
>docs about sow interaction with autoGeneratePhraseQueries); and the 
>implication of setting saw=true.
> I've found this:[SOLR-9185] Solr's edismax and "Lucene"/standard query 
> parsers should optionally not split on whitespace before sending terms to 
> analysis - ASF JIRA
>
> |
> |
> |  |
> [SOLR-9185] Solr's edismax and "Lucene"/standard query parsers should op...
>
>
>  |
>
>  |
>
>  |
>
>
> But it's very low level and I can't find any doc more "user friendly"
>
>    Il martedì 16 aprile 2019, 09:00:08 CEST, Leonardo Francalanci 
> ha scritto:
>
>  Hi,
>
> I'm using Solr 8.0.0  I can't get autoGeneratePhraseQueries to work (also 
> tried with 7.7.1 and same result):
>
> debug":{
>    "rawquerystring":"TROUBLESHOOT:my25word",
>    "querystring":"TROUBLESHOOT:my25word",
>    "parsedquery":"TROUBLESHOOT:my TROUBLESHOOT:25 TROUBLESHOOT:word",
>    "parsedquery_toString":"TROUBLESHOOT:my TROUBLESHOOT:25 TROUBLESHOOT:word",
>
> I expected something like
>
> "parsedquery":"TROUBLESHOOT:"my 25 word"
> Why isn't autoGeneratePhraseQueries generating a quoted string argument when 
> I query???
>
>
> This is my configuration:
>
>      indexed="true"  stored="true"/>
>    positionIncrementGap="100" autoGeneratePhraseQueries="true">
>      
>        
>        
>        
>                        ignoreCase="true"
>                words="lang/stopwords_en.txt"
>        />
>        generateWordParts="1" generateNumberParts="1" catenateWords="1" 
>catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>        
>        protected="protwords.txt"/>
>        
>        
>      
>      
>        
>        ignoreCase="true" expand="true"/>
>                        ignoreCase="true"
>                words="lang/stopwords_en.txt"
>        />
>        generateWordParts="1" generateNumberParts="1" catenateWords="0" 
>catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>        
>        protected="protwords.txt"/>
>        
>      
>    
>  stored="true" multiValued="true" omitNorms="true"/>
>
>  

Re: autoGeneratePhraseQueries not working

2019-04-16 Thread Alexandre Rafalovitch
The issue is that the Standard Query Parser does pre-processing of the
query and splits it on whitespace beforehand (to deal with all the
special syntax). So, if you don't use quoted phrases then by the time
the field specific query analyzer chain kicks in, the text is already
pre-split and the analyzer only sees one (pre space-separated) token
at a time. So, the autoGeneratePhraseQueries does not work then. If
you use different parsers that send whole text in (e.g. FieldQParser),
then - I think - it will work.

Or, like you discovered, sow=true tells the Standard Query Parser to
send it all together as well.

It is a bit of a messy part of Solr, because the Admin Analysis page
sends the text to the query analyzer without splitting (it does not
use any Query Parser). So, that adds to the confusion.

Regards,
   Alex.

On Tue, 16 Apr 2019 at 10:53, Leonardo Francalanci
 wrote:
>
>  To add some information: using "sow=true" it seems to work.But I don't 
> understand why with "sow=false" it wouldn't work (can't find anything in the 
> docs about sow interaction with autoGeneratePhraseQueries); and the 
> implication of setting saw=true.
> I've found this:[SOLR-9185] Solr's edismax and "Lucene"/standard query 
> parsers should optionally not split on whitespace before sending terms to 
> analysis - ASF JIRA
>
> |
> |
> |  |
> [SOLR-9185] Solr's edismax and "Lucene"/standard query parsers should op...
>
>
>  |
>
>  |
>
>  |
>
>
> But it's very low level and I can't find any doc more "user friendly"
>
> Il martedì 16 aprile 2019, 09:00:08 CEST, Leonardo Francalanci 
>  ha scritto:
>
>  Hi,
>
> I'm using Solr 8.0.0  I can't get autoGeneratePhraseQueries to work (also 
> tried with 7.7.1 and same result):
>
> debug":{
> "rawquerystring":"TROUBLESHOOT:my25word",
> "querystring":"TROUBLESHOOT:my25word",
> "parsedquery":"TROUBLESHOOT:my TROUBLESHOOT:25 TROUBLESHOOT:word",
> "parsedquery_toString":"TROUBLESHOOT:my TROUBLESHOOT:25 
> TROUBLESHOOT:word",
>
> I expected something like
>
> "parsedquery":"TROUBLESHOOT:"my 25 word"
> Why isn't autoGeneratePhraseQueries generating a quoted string argument when 
> I query???
>
>
> This is my configuration:
>
>indexed="true"  stored="true"/>
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
>   
> 
> 
> 
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
>  generateWordParts="1" generateNumberParts="1" catenateWords="1" 
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
> 
>   
>   
> 
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
>  generateWordParts="1" generateNumberParts="1" catenateWords="0" 
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
>   
> 
>  stored="true" multiValued="true" omitNorms="true"/>
>
>


Re: autoGeneratePhraseQueries not working

2019-04-16 Thread Leonardo Francalanci
 To add some information: using "sow=true" it seems to work.But I don't 
understand why with "sow=false" it wouldn't work (can't find anything in the 
docs about sow interaction with autoGeneratePhraseQueries); and the implication 
of setting saw=true.
I've found this:[SOLR-9185] Solr's edismax and "Lucene"/standard query parsers 
should optionally not split on whitespace before sending terms to analysis - 
ASF JIRA

| 
| 
|  | 
[SOLR-9185] Solr's edismax and "Lucene"/standard query parsers should op...


 |

 |

 |


But it's very low level and I can't find any doc more "user friendly"

Il martedì 16 aprile 2019, 09:00:08 CEST, Leonardo Francalanci 
 ha scritto:  
 
 Hi,

I'm using Solr 8.0.0  I can't get autoGeneratePhraseQueries to work (also tried 
with 7.7.1 and same result):

debug":{
    "rawquerystring":"TROUBLESHOOT:my25word",
    "querystring":"TROUBLESHOOT:my25word",
    "parsedquery":"TROUBLESHOOT:my TROUBLESHOOT:25 TROUBLESHOOT:word",
    "parsedquery_toString":"TROUBLESHOOT:my TROUBLESHOOT:25 TROUBLESHOOT:word",

I expected something like

"parsedquery":"TROUBLESHOOT:"my 25 word"
Why isn't autoGeneratePhraseQueries generating a quoted string argument when I 
query???


This is my configuration:

  
    
  
    
    
    
    
    
    
    
    
    
  
  
    
    
    
    
    
    
    
  
    


  

autoGeneratePhraseQueries not working

2019-04-16 Thread Leonardo Francalanci
Hi,

I'm using Solr 8.0.0  I can't get autoGeneratePhraseQueries to work (also tried 
with 7.7.1 and same result):

debug":{
    "rawquerystring":"TROUBLESHOOT:my25word",
    "querystring":"TROUBLESHOOT:my25word",
    "parsedquery":"TROUBLESHOOT:my TROUBLESHOOT:25 TROUBLESHOOT:word",
    "parsedquery_toString":"TROUBLESHOOT:my TROUBLESHOOT:25 TROUBLESHOOT:word",

I expected something like

"parsedquery":"TROUBLESHOOT:"my 25 word"
Why isn't autoGeneratePhraseQueries generating a quoted string argument when I 
query???


This is my configuration: