Hi Everyone,

I think the subject line said it all.  Here is the schema I'm using:

<fieldType name="my_text" class="solr.TextField" positionIncrementGap="100"
autoGeneratePhraseQueries="true">
  <analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_en.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="1" splitOnCaseChange="0" splitOnNumerics="1"
stemEnglishPossessive="1" preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

I'm guessing this is due to how solr.WhitespaceTokenizerFactory works and
those that it is not indexing are removed because they are considered
"white-spaces"?  If so, how can I include %, &, etc. into this none-indexed
list?  I would rather see all these not indexed vs some are and some are
not causing confusion to my users.

Thanks

Steve

Reply via email to