did you index with solr 1.4 (or are you using solr 1.4) ? at a quick glance, it looks like it might be this: https://issues.apache.org/jira/browse/SOLR-1852 , which was fixed in 1.4.1
On Tue, Sep 14, 2010 at 5:40 AM, yandong yao <yydz...@gmail.com> wrote: > Hi Guys, > > I encountered a problem when enabling WordDelimiterFilterFactory for both > index and query (pasted relative part of schema.xml at the bottom of > email). > > *1. Steps to reproduce:* > 1.1 The indexed sample document contains only one sentence: "This is a > TechNote." > 1.2 Query is: q=TechNote > 1.3 Result: no matches return, while the above sentence contains word > 'TechNote' absolutely. > > * > 2. Output when enabling debugQuery* > By turning on debugQuery > > http://localhost:7111/solr/test/select?indent=on&version=2.2&q=TechNote&fq=&start=0&rows=0&fl=*%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=id%3A001&hl.fl= > , > get following information: > > <str name="rawquerystring">TechNote</str> > <str name="querystring">TechNote</str> > <str name="parsedquery">PhraseQuery(all:"tech note")</str> > <str name="parsedquery_toString">all:"tech note"</str> > <lst name="explain"/> > <str name="otherQuery">id:001</str> > <lst name="explainOther"> > <str name="001"> > 0.0 = fieldWeight(all:"tech note" in 0), product of: 0.0 = > tf(phraseFreq=0.0) > 0.61370564 = idf(all: tech=1 note=1) > 0.25 = fieldNorm(field=all, doc=0) > </str> > </lst> > > Seems that the raw query string is converted to phrase query "tech note", > while its term frequency is 0, so no matches. > > *3. Result from admin/analysis.jsp page* > > From analysis.jsp, seems the query 'TechNote' matches the input document, > see below words marked by RED color. > > Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term > position 1234 term text ThisisaTechNote. term type wordwordwordword source > start,end 0,45,78,910,19 payload > > > > org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, > expand=true, ignoreCase=true} term position 1234 term text > ThisisaTechNote. term > type wordwordwordword source start,end 0,45,78,910,19 payload > > > > org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1, > generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=0, > catenateNumbers=1} term position 12345 term text ThisisaTechNote TechNote > term > type wordwordwordwordword word source start,end 0,45,78,910,1414,18 10,18 > payload > > > > > > org.apache.solr.analysis.LowerCaseFilterFactory {} term position 12345 > term > text thisisatechnote technote term type wordwordwordwordword word source > start,end 0,45,78,910,1414,18 10,18 payload > > > > > > org.apache.solr.analysis.SnowballPorterFilterFactory > {protected=protwords.txt, language=English} term position 12345 term text > thisisa*tech**note* technot term type wordwordwordwordword word source > start,end 0,45,78,910,1414,18 10,18 payload > > > > > > Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} > term > position 1 term text TechNote term type word source start,end 0,8 payload > org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, > expand=true, ignoreCase=true} term position 1 term text TechNote term type > word source start,end 0,8 payload > org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1, > generateNumberParts=1, catenateWords=0, generateWordParts=1, catenateAll=0, > catenateNumbers=0} term position 12 term text TechNote term type > wordword source > start,end 0,44,8 payload > > org.apache.solr.analysis.LowerCaseFilterFactory {} term position 12 term > text technote term type wordword source start,end 0,44,8 payload > > org.apache.solr.analysis.SnowballPorterFilterFactory > {protected=protwords.txt, language=English} term position 12 term text tech > note term type wordword source start,end 0,44,8 payload > > > * > 4. My questions are:* > 4.1: Why debugQuery and analysis.jsp has different result? > 4.2: From my understanding, during indexing, the word 'TechNote' will be > converted to: 1) 'technote' and 2) 'tech note' according to my config in > schema.xml. And at query time, 'TechNote' will be converted to 'tech note', > thus it SHOULD match. Am I right? > 4.3: Why the phrase frequency 'tech note' is 0 in the output of > debugQuery result (0.0 = tf(phraseFreq=0.0))? > > Any suggestion/comments are absolutely welcome! > > > *5. fieldType definition in schema.xml* > > <fieldType name="text" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" language="English" > protected="protwords.txt"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" language="English" > protected="protwords.txt"/> > </analyzer> > </fieldType> > > > Thanks very much! > -- Robert Muir rcm...@gmail.com