Tks for the explain now I can clearly understand why it doesn't work as
I was expecting :)
jfmel...@free.fr a écrit :
if the request contains any wilcard then filters are not called :
no ISOLatin1AccentFilterFactory and no SnowballPorterFilterFactory !
"économie" is indexed to "econom"
solr don't found :
- term starts with "éco" (éco*)
- term starts with "economi" (economi*)
if you index manger, mangé and mangue, the indexed terms will be mang and mangu
requests -> results
manger -> mange, mangé
mangé -> mange, mangé
mang -> mange, manger
mangu -> mangue
mang* -> manger, mangé, mangue
mang? -> mangue (and not mangé)
mangé* -> nothing
Jean-François
----- "Nicolas Leconte" <nicolas.ai...@aidel.com> a écrit :
| Hi all,
|
| I have a field that contains accentuated char in it, what I whant is
| to
| be able to search with ignore accents.
| I have set up that field with :
| <analyzer>
| <tokenizer class="solr.StandardTokenizerFactory"/>
| <filter class="solr.StandardFilterFactory"/>
| <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
|
| generateNumberParts="1" catenateWords="1" catenateNumbers="1"
| catenateAll="0" splitOnCaseChange="1" />
| <filter class="solr.LowerCaseFilterFactory"/>
| <filter class="solr.StopFilterFactory" ignoreCase="true"
| words="stopwords.txt" />
| <filter class="solr.SnowballPorterFilterFactory" language="French"/>
| <filter class="solr.LowerCaseFilterFactory"/>
| <filter class="solr.ISOLatin1AccentFilterFactory"/>
| <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
| </analyzer>
|
| In the index the word "économie" is translated to "econom", the
| accent
| is removed thanks to the ISOLatin1AccentFilterFactory and the end of
| the
| word removent thanks to the SnowballPorterFilterFactory.
|
| When I request with title:econ* I can have the correct answers, but
| if
| I request with title:écon* I have no answers.
| If I request with title:économ (the exact word of the index) it works,
|
| so there might be something wrong with the wildcard.
| As far as I can understand the analyser should be use exactly the same
|
| in both index and query time.
|
| I have tested with changing the order of the filters (putting the
| ISOLatin1AccentFilterFactory on top) without any result.
|
| Could anybody help me with that and point me what may be wrong with my
|
| shema ?