RE: Search with Accent and without accent Character

Markus Jelsma Tue, 13 Feb 2018 14:10:12 -0800

Checked and confirmed, even Dutch digraph Ĳ is folded properly, as well as the 
upper case dotless Turkish i and the Spanish example you provided is folded 
properly.


Correction for German (before Nagel corrects me), ö and ü are not normalized by 
ICU folder according to German rules. Their accents are stripped instead of 
transforming them into oe and ue respectively. It makes the case of language 
specific folders, especially when dealing with Scandinavian or German. Dutch 
and Latin can be folded just by removing their accents.

Correct me when im wrong!
Markus
 
-----Original message-----
> From:Markus Jelsma <[email protected]>
> Sent: Tuesday 13th February 2018 22:21
> To: [email protected]
> Subject: RE: Search with Accent and without accent Character
> 
> Hi,
> 
> My guess is you haven't reindexed after changing filter configuration, which 
> is required for index-time filters.
> 
> Regarding your fieldType, you can drop the lowercase and ASCII folding 
> filters and just keep the ICU folder, it will work for pretty much any 
> character set. It will normalize case, Scandinavian digraphs (AE), probably 
> Dutch digraphs (IJ) as well. But also deal with German oe ü, ringel s and all 
> regular Latin accents including Spanish tilde ~, circumflex etc.
> 
> If a there is a language specific normalizer/folder, use that instead of ICU 
> because there can be differences in how accents should be normalized across 
> languages.
> 
> And do not forget to reindex and use the same normalizers index- and 
> query-time.
> 
> Regards,
> Markus
> 
>  
>  
> -----Original message-----
> > From:Rushi <[email protected]>
> > Sent: Tuesday 13th February 2018 19:40
> > To: [email protected]
> > Subject: Search with Accent and without accent Character
> > 
> > Hello All,
> > I integrated Nutch with solr ,everything seems to be fine till now, i am
> > having a issue while searching some spanish accent characters,the search
> > results are not same,with accent (Example :investigación) gives correct
> > result  but without accent(example :investigacion) gives zero results.
> > I tried using  various filters but still the issue is same.Here is my
> > configuration on nutch and solr.
> > 
> > 
> >  <fieldType name="text_es" class="solr.TextField"
> > positionIncrementGap="100">
> >     <analyzer type="index">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.ICUFoldingFilterFactory" />
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.ASCIIFoldingFilterFactory"/>
> >         <filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
> > maxGramSize="50" side="front"/>
> >     </analyzer>
> >     <analyzer type="query">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.ICUFoldingFilterFactory" />
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.ASCIIFoldingFilterFactory"/>
> > 
> >     </analyzer>
> >   </fieldType>
> > 
> > I would really appreciate if  anyone of you can  tell me what i am missing?
> > -- 
> > Regards
> > Rushikesh M
> > .Net Developer
> > 
>

RE: Search with Accent and without accent Character

Reply via email to