Re: Stemming and accents (HunspellStemFilterFactory)

Chantal Ackermann Tue, 14 Feb 2012 07:28:21 -0800

Hi Bráulio,

I don't know about HunspellStemFilterFactory especially but concerning
accents:

There are several accent filter that will remove accents from your
tokens. If the Hunspell filter factory requires the accents, then simply
add the accent filters after Hunspell in your index and query filter
chains.

You would then have Hunspell produce the tokens as result of the
stemming and only afterwards the accents would be removed (your example:
'forum' instead of 'fórum'). Do the same on the query side in case
someone inputs accents.

Accent filters are:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory
(lowercases, as well!)
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory

and others on that page.

Chantal

On Tue, 2012-02-14 at 14:48 +0100, Bráulio Bhavamitra wrote:
> Hello all,
> 
> I'm evaluating the HunspellStemFilterFactory I found it works with a
> pt_PT dictionary.
> 
> For example, if I search for 'fóruns' it stems it to 'fórum' and then find
> 'fórum' references.
> 
> But if I search for 'foruns' (without accent),
> then HunspellStemFilterFactory cannot stem
> word, as it does' not exist in its dictionary.
> 
> It there any way to make HunspellStemFilterFactory work without accents
> differences?
> 
> best,
> bráulio

Re: Stemming and accents (HunspellStemFilterFactory)

Reply via email to