I obtained some Chinese language webpages via "nutch fetch". But some
Chinese characters do not come out right after I dumped the segment back to
html pages. For instance:
http://www.dianping.com/shop/501079/
has title portion:
<head><title>
韶山冲(徐汇店)(图)_上海_大众点评网
</title>

However, I got this after dumping:
<head><title>
韶山��1¤7(徐汇庄1¤7)(��1¤7)_上海_大众点评罄1¤7
</title>


The charset specified in the page is "UTF-8". As I includeded the following
in "nutch-site.xml"
<name>parser.character.encoding.default</name>
  <value>UTF-8</value>

It makes no difference.

What could be the problem?


[image: 回复时引用此帖] <newreply.php?do=newreply&p=5869>

Reply via email to