Re: Iso accents and wildcards

Nicolas Leconte Sun, 01 Nov 2009 23:26:25 -0800

Tks for the explain now I can clearly understand why it doesn't work asI was expecting :)


jfmel...@free.fr a écrit :

if the request contains any wilcard then filters are not called :
no ISOLatin1AccentFilterFactory and no SnowballPorterFilterFactory  !
"économie" is indexed to "econom"

solr don't found :
 - term starts with "éco"     (éco*)
 - term starts with "economi" (economi*)

if you index manger, mangé and mangue, the indexed terms will be mang and mangu

requests  ->  results

manger   ->   mange, mangé
mangé    ->   mange, mangé
mang     ->   mange, manger
mangu    ->   mangue
mang*    ->   manger, mangé, mangue
mang?    ->   mangue  (and not mangé)
mangé*   ->   nothing

Jean-François


----- "Nicolas Leconte" <nicolas.ai...@aidel.com> a écrit :

| Hi all,
|| I have a field that contains accentuated char in it, what I whant is| to| be able to search with ignore accents.
| I have set up that field with :
| <analyzer>
| <tokenizer class="solr.StandardTokenizerFactory"/>
| <filter class="solr.StandardFilterFactory"/>
| <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
|| generateNumberParts="1" catenateWords="1" catenateNumbers="1"| catenateAll="0" splitOnCaseChange="1" />
| <filter class="solr.LowerCaseFilterFactory"/>
| <filter class="solr.StopFilterFactory" ignoreCase="true"| words="stopwords.txt" />
| <filter class="solr.SnowballPorterFilterFactory" language="French"/>
| <filter class="solr.LowerCaseFilterFactory"/>
| <filter class="solr.ISOLatin1AccentFilterFactory"/>
| <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
| </analyzer>
|| In the index the word "économie" is translated to "econom", the| accent| is removed thanks to the ISOLatin1AccentFilterFactory and the end of| the| word removent thanks to the SnowballPorterFilterFactory.|| When I request with title:econ* I can have the correct answers, but| if| I request with title:écon* I have no answers.
| If I request with title:économ (the exact word of the index) it works,
|| so there might be something wrong with the wildcard.
| As far as I can understand the analyser should be use exactly the same
|| in both index and query time.|| I have tested with changing the order of the filters (putting the| ISOLatin1AccentFilterFactory on top) without any result.|| Could anybody help me with that and point me what may be wrong with my|| shema ?

Re: Iso accents and wildcards

Reply via email to