Hi there,

I have a document and its title is "20111213_solr_apache conference report".

When I use analysis web interface to see what tokens exactly solr analyze
and the following is the result

term text20111213_solrapacheconferencereportterm type<NUM><ALPHANUM>
<ALPHANUM><ALPHANUM>


Why 20111213_solr tokenized as <NUM> and "_" char won't be removed? (I've
add "_" as stop word in stopwords.txt)

I did another test when "20111213_solr_apache conference_report".
As you can see the difference is I add an underscore char between
conference and report. To analyze this string
term text20111213_solrapacheconferencereportterm type<NUM><ALPHANUM>
<ALPHANUM><ALPHANUM>
this time the underscore char between conference and report is removed!

Why? How to make solr remove underscore char and behave consistent?
Please help on this.

Thanks in advance.

Floyd

Reply via email to