Hi, I have tracked this problem to:
https://issues.apache.org/jira/browse/SOLR-879 Executive summary is that there are errors that relate to text fields in both: - src/java/org/apache/solr/search/SolrQueryParser.java - example/solr/conf/schema.xml It is fixed in 1.4. Thank you Yonik Seeley for the original diagnosis and fix. Cheers, -- Phil It may be that your sole purpose in life is simply to serve as a warning to others. Phil Chadwick wrote: > Hi Jay > > Thank you for your response. > > The data relating to the string (s_title) defines *exactly* what was > fed into the SOLR indexing. The string is not otherwise relevant to > the question. > > The essence of my question is why can the indexed text (t_title) not > be phrase matched by the query on the text when the word "for" is > present in the query. > > The following work (and I would expect them to work): > > q=s_title:"FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT" > q=t_title:"future directions" > q=t_title:"integrated catchment" > > The following do not work (and I would expect them to work): > > q=t_title:"directions for integrated" > > The following do not work (not sure if I expect them to work or not): > > q=t_title:"directions integrated" > > My reading is that if the "FOR" is removed in the text indexing, it > should also be removed for the text query! > > I also added 'enablePositionIncrements="true"' to the text query analyzer > to make it the same as the text index analyzer: > > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="stopwords.txt" > enablePositionIncrements="true"/> > > There was no change in the outcome. > > The definitions for text and string were exactly as in the SOLR 1.3 > example schema (shown below). > > The section of that schema for "text" is shown below. > > <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> > > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="stopwords.txt" > enablePositionIncrements="true"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" > generateNumberParts="1" > catenateWords="1" > catenateNumbers="1" > catenateAll="0" > splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishPorterFilterFactory" > protected="protwords.txt"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" > synonyms="synonyms.txt" > ignoreCase="true" > expand="true"/> > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="stopwords.txt" > <!-- enablePositionIncrements="true" --> > /> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" > generateNumberParts="1" > catenateWords="0" > catenateNumbers="0" > catenateAll="0" > splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishPorterFilterFactory" > protected="protwords.txt"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > > </fieldType> > > > Cheers, > > > -- > Phil > > The art of being wise is the art of knowing what to overlook. > -- William James > > > > Jay Hill wrote: > > > > The string fieldtype is not being tokenized, while the text fieldtype is > > tokenized. So the stop word "for" is being removed by a stop word filter, > > which doesn't happen with the text field type (no tokenizing). > > > > Have a look at the schema.xml in the example dir and look at the default > > configuration for both the text and string fieldtypes. String string > > fieldtype is not analyzed whereas the text fieldtype has a number of > > different filters that take action. > > > On Wed, May 6, 2009 at 11:09 PM, Phil Chadwick > > <p.chadw...@internode.on.net>wrote: > > > > > Hi, > > > > > > I'm trying to figure out why phrase matching on a text field only works > > > some of the time. > > > > > > I have a SOLR index containing a document titled "FUTURE DIRECTIONS FOR > > > INTEGRATED CATCHMENT". The "FOR" seems to be causing a problem... > > > > > > The title field is indexed as both s_title and t_title (string and text, > > > as defined in the demo schema), thus: > > > > > > <field name="title" type="string" indexed="false" stored="false" > > > multiValued="false" /> > > > <field name="s_title" type="string" indexed="true" stored="true" > > > multiValued="false" /> > > > <field name="t_title" type="text" indexed="true" stored="false" > > > multiValued="false" /> > > > <copyField source="title" dest="s_title" /> > > > <copyField source="title" dest="t_title" /> > > > > > > I can match the document with an exact query on the string: > > > > > > q=s_title:"FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT" > > > > > > I can match the document with this phrase query on the text: > > > > > > q=t_title:"future directions" > > > > > > which uses the parsedquery shown by "&debugQuery=true": > > > > > > <str name="rawquerystring">t_title:"future directions"</str> > > > <str name="querystring">t_title:"future directions"</str> > > > <str name="parsedquery">PhraseQuery(t_title:"futur direct")</str> > > > <str name="parsedquery_toString">t_title:"futur direct"</str> > > > > > > Similarly, I can match the document with this query: > > > > > > q=t_title:"integrated catchment" > > > > > > which uses the parsedquery shown by "&debugQuery=true": > > > > > > <str name="rawquerystring">t_title:"integrated catchment"</str> > > > <str name="querystring">t_title:"integrated catchment"</str> > > > <str name="parsedquery">PhraseQuery(t_title:"integr catchment")</str> > > > <str name="parsedquery_toString">t_title:"integr catchment"</str> > > > > > > But I can not match the document with the query: > > > > > > q=t_title:"future directions for integrated catchment" > > > > > > which uses the phrase query shown by "&debugQuery=true": > > > > > > <str name="rawquerystring"> > > > t_title:"future directions for integrated catchment"</str> > > > <str name="querystring"> > > > t_title:"future directions for integrated catchment"</str> > > > <str name="parsedquery"> > > > PhraseQuery(t_title:"futur direct integr catchment")</str> > > > <str name="parsedquery_toString"> > > > t_title:"futur direct integr catchment"</str> > > > > > > Any wisdom gratefully accepted. > > > > > > Cheers, > > > > > > > > > -- > > > Phil > > > > > > 640K ought to be enough for anybody. > > > -- Bill Gates, in 1981