I cannot tell for sure without looking at the code, but my guess is diacritics 
are simply not being stripped anywhere.  I imagine you could modify the 
NutchAnalyzer to include that ISO...Filter, the same class that you must have 
configured in your Solr schema.xml.

Otis 

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: carlos orrego <[EMAIL PROTECTED]>
To: [email protected]
Sent: Saturday, April 5, 2008 12:50:23 AM
Subject: dealing with utf-8 characters


I have this issue: 
If i query for pérez i should get results including pérez and perez (without
the accent).

This is the case on google and on solr which i use on other projects. Why
nutch is not giving me the same results??

any ideas?

thanks
-- 
View this message in context: 
http://www.nabble.com/dealing-with-utf-8-characters-tp16502905p16502905.html
Sent from the Nutch - User mailing list archive at Nabble.com.




Reply via email to