I'm setting up Solr to run on a web site I'm working on. Basically, if I use no synonym file, then Solr is working really well for finding text, the porter stemmer filter is great.
It also works with a small synonym file, like the one in the example, which defines Television,TV. But when I add a large synonym file (like approx 7000 synonyms), then everything breaks down. Even queries for exact words don't return any results. Could it be that there is something in the synonym file (non-ascii char for example) that is causing the synonym filter to do something wierd, like not pass any tokens? Could it be that the synonym filter is now expanding practically everything so that no document is considered relevant enough? (I tried making the defaultOperator="OR" no difference.) My text field is defined in the schema as: <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> Thanks for any help, Matt -- View this message in context: http://www.nabble.com/Synonyms-list-breaks-solr-tp18401710p18401710.html Sent from the Solr - User mailing list archive at Nabble.com.