Are you sure that the HTMLParser is decoding the page incorrectly? I've seen Nutch deployments where the characters are correctly decoded by the HTMLParser and are correct in the Lucene index, but then the webapp is misconfigured such that they are not displayed correctly on the search results page.
You can use the Lucene toolkit "luke" to open the index and examine the contents. If the stuff in the index is good, then the problem is not the HTMLParser. Regards, Aaron -- Aaron Binns Senior Software Engineer, Web Group Internet Archive aa...@archive.org