Hi,
I have a problem regarding a diacritic character on my query string :
*q=intertestualità
*
which is encoded in
*q=intertestualit%E0
*
What I'm not understanding is the following query response fragments :
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">23</int>
<lst name="params">
<str name="sort">score desc</str>
<str name="fl">score,title</str>
<str name="debugQuery">on</str>
<str name="indent">on</str>
<str name="start">0</str>
*<str name="q">intertestualit</str>*
<str name="version">2.2</str>
<str name="rows">3</str>
</lst>
and
<lst name="debug">
<str name="rawquerystring">*intertestualit*</str>
<str name="querystring">*intertestualit*</str>
I saw that my index contains the token "intertestualita" (with the 'à' char replaced with
'a'). Indeed if I query for "intertestualita" I found my results.
The queried field is configured with the same chain :
<fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="schema.UnicodeNormalizationFilterFactory" version="icu4j"
composed="false" remove_diacritics="true" remove_modifiers="true" fold="true" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"
/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="schema.UnicodeNormalizationFilterFactory" version="icu4j" composed="false"
remove_diacritics="true" remove_modifiers="true" fold="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldtype>
So my question is : who is removing the "à" (%E0) characters from the
input query? It seems that the query arrives to SOLR already without
that character...
Regards,
Andrea