Hi,

I use nutch 0.9 to crawl some Chinese web site, and search using nutch
web portal but found that cached html copy display incorrectly.
Then I use "bin/nutch readseg -dump" to dump segments :
ParseText(UTF-8) display correctly, but the Chinse character in
Content display incorrectly as '?'.--the original html uses gd2312
charset.

What's the possible cause? And how to fix?

Thanks in advance,
Xiong

Reply via email to