Re: Synonyms problem

Plamen Mihaylov Fri, 29 Mar 2013 09:53:43 -0700

Guys,

This is a commented line where expand is false. I moved the synonym filter
after tokenizer, but the result is the same.


Actual configuration:

        <fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
                    catenateNumbers="1" catenateAll="0"
splitOnCaseChange="1" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.PhoneticFilterFactory"
encoder="DoubleMetaphone" inject="true" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
                <filter class="solr.LengthFilterFactory" min="2" max="100"
/>
                <!-- <filter class="solr.SnowballPorterFilterFactory"
language="English" /> -->
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true" />
                <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
                    catenateNumbers="0" catenateAll="0" />
                <filter class="solr.LowerCaseFilterFactory" />
                <!-- <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/> -->
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
                <filter class="solr.StopFilterFactory" ignoreCase="true"
words="letterstops.txt" enablePositionIncrements="true" />
            </analyzer>
        </fieldType>

2013/3/29 Walter Underwood <wun...@wunderwood.org>

> Also, all the filters need to be after the tokenizer. There are two
> synonym filters specified, one before the tokenizer and one after.
>
> I'm surprised that works at all. Shouldn't that be fatal error when
> loading the config?
>
> wunder
>
> On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote:
>
> > Hi Plamen
> >
> > You should set expand to true during
> >
> > <analyzer type="index">
> > ....
> > <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
> >              ignoreCase="true" expand="true"/>
> >
> >
> > ...
> >
> > Greetings,
> >
> > Thomas
> >
> > Am 29.03.2013 17:16, schrieb Plamen Mihaylov:
> >> Hey guys,
> >>
> >> I have the following problem - I have a website with sport players,
> where
> >> using Solr indexing their data. I have defined synonyms like: NY, New
> York.
> >> When I search for New York - there are 145 results found, but when I
> search
> >> for NY - there are 142 results found. Why there is a diff and how can I
> fix
> >> this?
> >>
> >> Configuration snippets:
> >>
> >> synonyms.txt
> >>
> >> ...
> >> NY, New York
> >> ...
> >>
> >> ------
> >> schema.xml
> >>
> >> ...
> >>         <fieldType name="text" class="solr.TextField"
> >> positionIncrementGap="100">
> >>            <analyzer type="index">
> >>                <filter class="solr.
> >> SynonymFilterFactory" synonyms="synonyms.txt"
> >>                    ignoreCase="true" expand="true"/>
> >>                <tokenizer class="solr.WhitespaceTokenizerFactory" />
> >>                <!-- we will only use synonyms at query time <filter
> >> class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
> >>                    ignoreCase="true" expand="false"/> -->
> >>
> >>                <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> words="stopwords.txt" enablePositionIncrements="true" />
> >>                <filter class="solr.WordDelimiterFilterFactory"
> >> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >>                    catenateNumbers="1" catenateAll="0"
> >> splitOnCaseChange="1" />
> >>                <filter class="solr.LowerCaseFilterFactory" />
> >>                <filter class="solr.PhoneticFilterFactory"
> >> encoder="DoubleMetaphone" inject="true" />
> >>                <filter class="solr.RemoveDuplicatesTokenFilterFactory"
> />
> >>                <filter class="solr.LengthFilterFactory" min="2"
> max="100"
> >> />
> >>                <!-- <filter class="solr.SnowballPorterFilterFactory"
> >> language="English" /> -->
> >>            </analyzer>
> >>            <analyzer type="query">
> >>                <filter class="solr.SynonymFilterFactory"
> >> synonyms="synonyms.txt" ignoreCase="true" expand="true" />
> >>                <tokenizer class="solr.WhitespaceTokenizerFactory" />
> >>
> >>                <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> words="stopwords.txt" />
> >>                <filter class="solr.WordDelimiterFilterFactory"
> >> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> >>                    catenateNumbers="0" catenateAll="0" />
> >>                <filter class="solr.LowerCaseFilterFactory" />
> >>                <!-- <filter class="solr.EnglishPorterFilterFactory"
> >> protected="protwords.txt"/> -->
> >>                <filter class="solr.RemoveDuplicatesTokenFilterFactory"
> />
> >>                <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> words="letterstops.txt" enablePositionIncrements="true" />
> >>            </analyzer>
> >>        </fieldType>
> >>
> >>
> >> Thanks in advance.
> >> Plamen
> >>
> >
> >
> > --
> >
> > ontopica GmbH
> > Prinz-Albert-Str. 2b
> > 53113 Bonn
> > Germany
> > fon: +49-228-227229-22
> > fax: +49-228-227229-77
> > web: http://www.ontopica.de
> > ontopica GmbH
> > Sitz der Gesellschaft: Bonn
> >
> > Geschäftsführung: Thomas Krämer, Christoph Okpue
> > Handelsregister: Amtsgericht Bonn, HRB 17852
> >
> >
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>


-- 
Поздрави
Пламен Михайлов

Re: Synonyms problem

Reply via email to