Hi, I use nutch 0.9 to crawl some Chinese web site, and search using nutch web portal but found that cached html copy display incorrectly. Then I use "bin/nutch readseg -dump" to dump segments : ParseText(UTF-8) display correctly, but the Chinse character in Content display incorrectly as '?'.--the original html uses gd2312 charset.
What's the possible cause? And how to fix? Thanks in advance, Xiong
