Brendan, pull up your Solr Admin "Analysis" page and try running your
queries through that. The output will tell you precisely how each
analyzer affects your tokens on either the index or query side.
In my own quick test, WordDelimiterFilterFactory seems inclined to
break "2WD" into ("2","WD")
(using org.apache.solr.analysis.WordDelimiterFilterFactory
{catenateWords=1, catenateNumbers=1, catenateAll=0,
generateNumberParts=1, generateWordParts=1})
--matt
On Dec 9, 2007, at 6:41 PM, Brendan Grainger wrote:
Hi,
I hope you can help me. I'm having an odd problem with solr. I have
a field that could be represent a car. A car could have a name like
"Silverado" or could be something like "Silverado 2WD" to denote the
2 wheel drive version of the car. Anyway, all is well when I search
over the field for "Silverado", but when I try searching for
"2WD" (doesn't matter what case) nothing is returned. Same applies
for "Silverado 2WD" etc. I currently have the field defined as text,
ie:
<field name="car_name" type="text" indexed="true" stored="true" />
But I've also tried defining my own (simpler) field with no luck.
FYI my text field is defined like this:
<fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<!-- This is supposed to remove HTML tags before indexing -->
<tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
<!--
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
-->
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
Any help?
Thanks!
Brendan
--
Matt Kangas / [EMAIL PROTECTED]