did you index with solr 1.4 (or are you using solr 1.4) ?

at a quick glance, it looks like it might be this:
https://issues.apache.org/jira/browse/SOLR-1852 , which was fixed in 1.4.1

On Tue, Sep 14, 2010 at 5:40 AM, yandong yao <yydz...@gmail.com> wrote:

> Hi Guys,
>
> I encountered a problem when enabling WordDelimiterFilterFactory for both
> index and query (pasted relative part of schema.xml at the bottom of
> email).
>
> *1. Steps to reproduce:*
>    1.1 The indexed sample document contains only one sentence: "This is a
> TechNote."
>    1.2 Query is: q=TechNote
>    1.3  Result: no matches return, while the above sentence contains word
> 'TechNote' absolutely.
>
> *
> 2. Output when enabling debugQuery*
> By turning on debugQuery
>
> http://localhost:7111/solr/test/select?indent=on&version=2.2&q=TechNote&fq=&start=0&rows=0&fl=*%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=id%3A001&hl.fl=
> ,
> get following information:
>
> <str name="rawquerystring">TechNote</str>
> <str name="querystring">TechNote</str>
> <str name="parsedquery">PhraseQuery(all:"tech note")</str>
> <str name="parsedquery_toString">all:"tech note"</str>
> <lst name="explain"/>
> <str name="otherQuery">id:001</str>
> <lst name="explainOther">
> <str name="001">
> 0.0 = fieldWeight(all:"tech note" in 0), product of: 0.0 =
> tf(phraseFreq=0.0)
>  0.61370564 = idf(all: tech=1 note=1)
>  0.25 = fieldNorm(field=all, doc=0)
> </str>
> </lst>
>
> Seems that the raw query string is converted to phrase query "tech note",
> while its term frequency is 0, so no matches.
>
> *3. Result from admin/analysis.jsp page*
>
> From analysis.jsp, seems the query 'TechNote' matches the input document,
> see below words marked by RED color.
>
> Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {}  term
> position 1234 term text ThisisaTechNote. term type wordwordwordword source
> start,end 0,45,78,910,19 payload
>
>
>
>  org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
> expand=true, ignoreCase=true}  term position 1234 term text
> ThisisaTechNote. term
> type wordwordwordword source start,end 0,45,78,910,19 payload
>
>
>
>  org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
> generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=0,
> catenateNumbers=1}  term position 12345 term text ThisisaTechNote TechNote
> term
> type wordwordwordwordword word source start,end 0,45,78,910,1414,18 10,18
> payload
>
>
>
>
>
>  org.apache.solr.analysis.LowerCaseFilterFactory {}  term position 12345
> term
> text thisisatechnote technote term type wordwordwordwordword word source
> start,end 0,45,78,910,1414,18 10,18 payload
>
>
>
>
>
>  org.apache.solr.analysis.SnowballPorterFilterFactory
> {protected=protwords.txt, language=English}  term position 12345 term text
> thisisa*tech**note* technot term type wordwordwordwordword word source
> start,end 0,45,78,910,1414,18 10,18 payload
>
>
>
>
>
>  Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {}
>  term
> position 1 term text TechNote term type word source start,end 0,8 payload
>  org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
> expand=true, ignoreCase=true}  term position 1 term text TechNote term type
> word source start,end 0,8 payload
>  org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
> generateNumberParts=1, catenateWords=0, generateWordParts=1, catenateAll=0,
> catenateNumbers=0}  term position 12 term text TechNote term type
> wordword source
> start,end 0,44,8 payload
>
>  org.apache.solr.analysis.LowerCaseFilterFactory {}  term position 12 term
> text technote term type wordword source start,end 0,44,8 payload
>
>  org.apache.solr.analysis.SnowballPorterFilterFactory
> {protected=protwords.txt, language=English} term position 12 term text tech
> note term type wordword source start,end 0,44,8 payload
>
>
> *
> 4. My questions are:*
>    4.1: Why debugQuery and analysis.jsp has different result?
>    4.2: From my understanding, during indexing, the word 'TechNote' will be
> converted to: 1) 'technote' and 2) 'tech note' according to my config in
> schema.xml. And at query time, 'TechNote' will be converted to 'tech note',
> thus it SHOULD match.  Am I right?
>     4.3: Why the phrase frequency 'tech note' is 0 in the output of
> debugQuery result (0.0 = tf(phraseFreq=0.0))?
>
> Any suggestion/comments are absolutely welcome!
>
>
> *5. fieldType definition in schema.xml*
>
>    <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
>      </analyzer>
>    </fieldType>
>
>
> Thanks very much!
>



-- 
Robert Muir
rcm...@gmail.com

Reply via email to