Hi all -

 

I have a similar problem, as follows:

 

Some of the synonyms for acetone are as follows:

 

1090,b-Ketopropane,Dimethyl formaldehyde,2-Propanone,dimethylketone,Ketone,
dimethyl-,methyl ketone,propan-2-one,propanone,β-Ketopropane,67-64-1

 

The analyzer during indexing is splitting 

b-Ketopropane to  b and b-Ketopropane

 

and Dimethyl formaldehyde to 

 

Dimethyl and  formaldehyde

 

How should I format my synonyms to avoid the splitting?

 

My Schema is as follows:

 

<fieldType name="text_syn" class="solr.TextField"
positionIncrementGap="100">

      <analyzer type="index">

        <tokenizer class="solr.WhitespaceTokenizerFactory" />

        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true" />

        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />

        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />

        <filter class="solr.LowerCaseFilterFactory" />

        <filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt" />

      </analyzer>

      <analyzer type="query">

        <tokenizer class="solr.WhitespaceTokenizerFactory" />

        <!--<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true" />-->

        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />

        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" />

        <filter class="solr.LowerCaseFilterFactory" />

        <filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt" />

      </analyzer>

    </fieldType>

 

 

As always thanks for the help.

 

Regards, Soumitra

 

-----Original Message-----
From: Bojan Miletic [mailto:extreme2...@gmail.com] 
Sent: Saturday, December 17, 2011 6:32 AM
To: solr-user@lucene.apache.org
Subject: Problem with synonyms containing whitespace

 

Hi everyone,

 

I'm having a bit of problem with synonyms.

 

My synonyms.txt looks like this:

 

> class\ 3\ (gvw\ 10001\ -\ 14000), light class 4 (gvw 14001 - 16000), 

> class 5 (gvw 16001 - 19500), class 6 (gvw

> 19501 - 26000), medium

> 

 

When testing in analyzer by using solr admin light gets correctly recognised
as one of the synonims, but when searching for  class 3 (gvw

10001 - 14000) analyzer can't find any synonyms.

As you can see I tried escaping whitespaces with \ but that didn't help.

 

Configuration of used field is

 

> !-- lowercases the entire field value, keeping it as a single token.  -->

>       <!-- used for working with synonyms -->

>     <fieldType name="lowercase_syn" class="solr.TextField"

> positionIncrementGap="100">

>       <analyzer type="index">>

>         <tokenizer class="solr.KeywordTokenizerFactory"/>

>         <filter class="solr.LowerCaseFilterFactory" />

>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"

> ignoreCase="true" expand="true"/>

>       </analyzer>

>       <analyzer type="query">>

>         <tokenizer class="solr.KeywordTokenizerFactory"/>

>         <filter class="solr.LowerCaseFilterFactory" />

>       </analyzer>

>     </fieldType>

> 

 

Could you please help me?

Thanks

Reply via email to