Are there any errors in your logs? Have you tried looking at the admin analysis page to see how text gets treated on that field?

Are you sure the large synonym file is formatted correctly?

-Grant

On Jul 11, 2008, at 7:23 AM, matt connolly wrote:


I'm setting up Solr to run on a web site I'm working on.

Basically, if I use no synonym file, then Solr is working really well for
finding text, the porter stemmer filter is great.

It also works with a small synonym file, like the one in the example, which
defines Television,TV.

But when I add a large synonym file (like approx 7000 synonyms), then
everything breaks down. Even queries for exact words don't return any
results.

Could it be that there is something in the synonym file (non-ascii char for example) that is causing the synonym filter to do something wierd, like not
pass any tokens?

Could it be that the synonym filter is now expanding practically everything
so that no document is considered relevant enough? (I tried making the
defaultOperator="OR" no difference.)


My text field is defined in the schema as:

   <fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
     <analyzer type="index">
       <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>
       <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
       <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
     <analyzer type="query">
       <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
       <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
       <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
   </fieldType>


Thanks for any help,
Matt


--
View this message in context: 
http://www.nabble.com/Synonyms-list-breaks-solr-tp18401710p18401710.html
Sent from the Solr - User mailing list archive at Nabble.com.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







Reply via email to