Hi all -
I have a similar problem, as follows: Some of the synonyms for acetone are as follows: 1090,b-Ketopropane,Dimethyl formaldehyde,2-Propanone,dimethylketone,Ketone, dimethyl-,methyl ketone,propan-2-one,propanone,β-Ketopropane,67-64-1 The analyzer during indexing is splitting b-Ketopropane to b and b-Ketopropane and Dimethyl formaldehyde to Dimethyl and formaldehyde How should I format my synonyms to avoid the splitting? My Schema is as follows: <fieldType name="text_syn" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <!--<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />--> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt" /> </analyzer> </fieldType> As always thanks for the help. Regards, Soumitra -----Original Message----- From: Bojan Miletic [mailto:extreme2...@gmail.com] Sent: Saturday, December 17, 2011 6:32 AM To: solr-user@lucene.apache.org Subject: Problem with synonyms containing whitespace Hi everyone, I'm having a bit of problem with synonyms. My synonyms.txt looks like this: > class\ 3\ (gvw\ 10001\ -\ 14000), light class 4 (gvw 14001 - 16000), > class 5 (gvw 16001 - 19500), class 6 (gvw > 19501 - 26000), medium > When testing in analyzer by using solr admin light gets correctly recognised as one of the synonims, but when searching for class 3 (gvw 10001 - 14000) analyzer can't find any synonyms. As you can see I tried escaping whitespaces with \ but that didn't help. Configuration of used field is > !-- lowercases the entire field value, keeping it as a single token. --> > <!-- used for working with synonyms --> > <fieldType name="lowercase_syn" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index">> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > </analyzer> > <analyzer type="query">> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory" /> > </analyzer> > </fieldType> > Could you please help me? Thanks