Re: Synonyms list breaks solr

Grant Ingersoll Fri, 11 Jul 2008 05:22:26 -0700

Are there any errors in your logs? Have you tried looking at theadmin analysis page to see how text gets treated on that field?


Are you sure the large synonym file is formatted correctly?


-Grant

On Jul 11, 2008, at 7:23 AM, matt connolly wrote:


I'm setting up Solr to run on a web site I'm working on.

Basically, if I use no synonym file, then Solr is working reallywell for

finding text, the porter stemmer filter is great.

It also works with a small synonym file, like the one in theexample, which

defines Television,TV.

But when I add a large synonym file (like approx 7000 synonyms), then
everything breaks down. Even queries for exact words don't return any
results.

Could it be that there is something in the synonym file (non-asciichar forexample) that is causing the synonym filter to do something wierd,like not

pass any tokens?

Could it be that the synonym filter is now expanding practicallyeverything

so that no document is considered relevant enough? (I tried making the
defaultOperator="OR" no difference.)


My text field is defined in the schema as:

   <fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
     <analyzer type="index">
       <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>
       <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
       <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
     <analyzer type="query">
       <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>

<filter class="solr.SynonymFilterFactory"synonyms="synonyms.txt"

ignoreCase="true" expand="true"/>
       <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
       <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
   </fieldType>


Thanks for any help,
Matt


--
View this message in context: 
http://www.nabble.com/Synonyms-list-breaks-solr-tp18401710p18401710.html
Sent from the Solr - User mailing list archive at Nabble.com.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: Synonyms list breaks solr

Reply via email to