RE: autoGeneratePhraseQueries sort of silently set to false

2012-02-23 Thread Burton-West, Tom
Thanks Erik,

The 3.1 changes document the ability to set this and the default being set to 
"true"
However apparently the change between 3.4 and 3.5 the default was set to 
"false"  
Since this will change the behavior of any field where 
autoGeneratePhraseQueries is not explicitly set, it could easily surprise users 
who update to 3.5. 

 That's why I think the changing of the default behavior (i.e. when not 
explicitly set) should be called out explicitly in the changes.txt for 3.5.   

True, everyone should read the notes in the example schema.xml, but I think it 
would help if the change was also noted in changes.txt.  

Is it possible to revise the changes.txt for 3.5?

Do you by any chance know where the change in the default behavior was 
discussed?  I know it has been a contentious issue.

Tom

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: Thursday, February 23, 2012 2:53 PM
To: solr-user@lucene.apache.org
Subject: Re: autoGeneratePhraseQueries sort of silently set to false

there's this (for 3.1, but in the 3.x CHANGES.txt):

* SOLR-2015: Add a boolean attribute autoGeneratePhraseQueries to TextField.
  autoGeneratePhraseQueries="true" (the default) causes the query parser to
  generate phrase queries if multiple tokens are generated from a single
  non-quoted analysis string.  For example WordDelimiterFilter splitting 
text:pdp-11
  will cause the parser to generate text:"pdp 11" rather than (text:PDP OR 
text:11).
  Note that autoGeneratePhraseQueries="true" tends to not work well for non 
whitespace
  delimited languages. (yonik)

with a ton of useful, though back and forth, commentary here: 
<https://issues.apache.org/jira/browse/SOLR-2015>

Note that the behavior, as Naomi pointed out so succinctly, is adjustable based 
off the *schema* version setting.  (look at your  line in schema.xml).  
The code is simply this:

if (schema.getVersion() > 1.3f) {
  autoGeneratePhraseQueries = false;
} else {
  autoGeneratePhraseQueries = true;
}

on TextField.  Specifying autoGeneratePhraseQueries explicitly on a field type 
overrides whatever the default may be.

Erik



On Feb 23, 2012, at 14:45 , Burton-West, Tom wrote:

> Seems like a change in default behavior like this should be included in the 
> changes.txt for Solr 3.5.
> Not sure how to do that.
> 
> Tom
> 
> -Original Message-
> From: Naomi Dushay [mailto:ndus...@stanford.edu] 
> Sent: Thursday, February 23, 2012 1:57 PM
> To: solr-user@lucene.apache.org
> Subject: autoGeneratePhraseQueries sort of silently set to false 
> 
> Another thing I noticed when upgrading from Solr 1.4 to Solr 3.5 had to do 
> with results when there were hyphenated words:   aaa-bbb.   Erik Hatcher 
> pointed me to the autoGeneratePhraseQueries attribute now available on 
> fieldtype definitions in schema.xml.  This is a great feature, and everything 
> is peachy if you start with Solr 3.4.   But many of us started earlier and 
> are upgrading, and that's a different story.
> 
> It was surprising to me that
> 
> a.  the default for this new feature caused different search results than 
> Solr 1.4 
> 
> b.  it wasn't documented clearly, IMO
> 
> http://wiki.apache.org/solr/SchemaXml   makes no mention of it
> 
> 
> In the schema.xml example, there is this at the top:
> 
> 
> 
> And there was this in a couple of field definitions:
> 
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
>  autoGeneratePhraseQueries="false">
> 
> But that was it.
> 



Re: autoGeneratePhraseQueries sort of silently set to false

2012-02-23 Thread Erik Hatcher
there's this (for 3.1, but in the 3.x CHANGES.txt):

* SOLR-2015: Add a boolean attribute autoGeneratePhraseQueries to TextField.
  autoGeneratePhraseQueries="true" (the default) causes the query parser to
  generate phrase queries if multiple tokens are generated from a single
  non-quoted analysis string.  For example WordDelimiterFilter splitting 
text:pdp-11
  will cause the parser to generate text:"pdp 11" rather than (text:PDP OR 
text:11).
  Note that autoGeneratePhraseQueries="true" tends to not work well for non 
whitespace
  delimited languages. (yonik)

with a ton of useful, though back and forth, commentary here: 


Note that the behavior, as Naomi pointed out so succinctly, is adjustable based 
off the *schema* version setting.  (look at your  line in schema.xml).  
The code is simply this:

if (schema.getVersion() > 1.3f) {
  autoGeneratePhraseQueries = false;
} else {
  autoGeneratePhraseQueries = true;
}

on TextField.  Specifying autoGeneratePhraseQueries explicitly on a field type 
overrides whatever the default may be.

Erik



On Feb 23, 2012, at 14:45 , Burton-West, Tom wrote:

> Seems like a change in default behavior like this should be included in the 
> changes.txt for Solr 3.5.
> Not sure how to do that.
> 
> Tom
> 
> -Original Message-
> From: Naomi Dushay [mailto:ndus...@stanford.edu] 
> Sent: Thursday, February 23, 2012 1:57 PM
> To: solr-user@lucene.apache.org
> Subject: autoGeneratePhraseQueries sort of silently set to false 
> 
> Another thing I noticed when upgrading from Solr 1.4 to Solr 3.5 had to do 
> with results when there were hyphenated words:   aaa-bbb.   Erik Hatcher 
> pointed me to the autoGeneratePhraseQueries attribute now available on 
> fieldtype definitions in schema.xml.  This is a great feature, and everything 
> is peachy if you start with Solr 3.4.   But many of us started earlier and 
> are upgrading, and that's a different story.
> 
> It was surprising to me that
> 
> a.  the default for this new feature caused different search results than 
> Solr 1.4 
> 
> b.  it wasn't documented clearly, IMO
> 
> http://wiki.apache.org/solr/SchemaXml   makes no mention of it
> 
> 
> In the schema.xml example, there is this at the top:
> 
> 
> 
> And there was this in a couple of field definitions:
> 
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
>  autoGeneratePhraseQueries="false">
> 
> But that was it.
> 



RE: autoGeneratePhraseQueries sort of silently set to false

2012-02-23 Thread Burton-West, Tom
Seems like a change in default behavior like this should be included in the 
changes.txt for Solr 3.5.
Not sure how to do that.

Tom

-Original Message-
From: Naomi Dushay [mailto:ndus...@stanford.edu] 
Sent: Thursday, February 23, 2012 1:57 PM
To: solr-user@lucene.apache.org
Subject: autoGeneratePhraseQueries sort of silently set to false 

Another thing I noticed when upgrading from Solr 1.4 to Solr 3.5 had to do with 
results when there were hyphenated words:   aaa-bbb.   Erik Hatcher pointed me 
to the autoGeneratePhraseQueries attribute now available on fieldtype 
definitions in schema.xml.  This is a great feature, and everything is peachy 
if you start with Solr 3.4.   But many of us started earlier and are upgrading, 
and that's a different story.

It was surprising to me that

a.  the default for this new feature caused different search results than Solr 
1.4 

b.  it wasn't documented clearly, IMO

http://wiki.apache.org/solr/SchemaXml   makes no mention of it


In the schema.xml example, there is this at the top:



And there was this in a couple of field definitions:




But that was it.