I cannot tell for sure without looking at the code, but my guess is diacritics are simply not being stripped anywhere. I imagine you could modify the NutchAnalyzer to include that ISO...Filter, the same class that you must have configured in your Solr schema.xml.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- From: carlos orrego <[EMAIL PROTECTED]> To: [email protected] Sent: Saturday, April 5, 2008 12:50:23 AM Subject: dealing with utf-8 characters I have this issue: If i query for pérez i should get results including pérez and perez (without the accent). This is the case on google and on solr which i use on other projects. Why nutch is not giving me the same results?? any ideas? thanks -- View this message in context: http://www.nabble.com/dealing-with-utf-8-characters-tp16502905p16502905.html Sent from the Nutch - User mailing list archive at Nabble.com.
