: I had to be brief as my facets are in the order of 100K over 800K documents : and also if I give the complete schema.xml I was afraid nobody would read my : long message :-) ..Hence I showed only relevant pieces of the result showing : different fields having same problem
relevant is good, but you have to provide a consistent picture from start to finish ... you don't need to show 1,000 lines of facet field output, but you at least need to show the field names. : <fieldType name="keywordText" class="solr.TextField" : sortMissingLast="true" omitNorms="true" positionIncrementGap="100"> : <analyzer type="index"> : <tokenizer class="solr.KeywordTokenizerFactory"/> : <filter class="solr.TrimFilterFactory" /> : <filter class="solr.StopFilterFactory" ignoreCase="true" : words="stopwords.txt,entity-stopwords.txt" enablePositionIncrements="true"/> : : <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" : ignoreCase="true" expand="false" /> : <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> : </analyzer> ...have you used analysis.jsp to see what terms that analyzer produces based on the strings you are indexing for your documents? becuase combined with synonyms like this... : New York, N.Y., NY => New York ...it doesn't suprise me that you're getting "New" as an indexed term. By default SynonymFilter uses whitespace to delimit tokens in multi-token synonyms, so for some input like "NY" you should see it produce the token "New" and "York" you can use the tokenizerFactory attribute on SynonymFilterFactory to specify a TokenizerFactory class to use when parsing synonyms.txt -Hoss