I am doing open-web crawls that include a number of pages that are not in English, or at least have content on them that is not in English.
In this specific case, a blog post in English, but with a comment or two in Russian. Crawling, indexing and searching all seem to work fine. That is, I can put some Russian characters into the search box and get appropriate looking results back. (I don't speak Russian or have any clue what the characters are; somebody else here in the office gave me the search term.) But the Nutch results page displays weird characters for the page summary excerpt. I can click through to the resulting page, and the Russian characters are correctly displayed there. I am using Firefox 2.0.0.9, set to Unicode(UTF-8) encoding for display. I've switched the encoding around, but can't get the page to look right. I've searched the list, and it seems that language concerns revolve around stemming and the like, which is not the problem I have here. Is there some sort of configuration knob I can turn on the search page? Is it possible to detect result character sets on the fly and "do the right thing" on the results page? Is there any kind of documentation I can consult about support for this kind of thing in Nutch? Thanks
