I obtained some Chinese language webpages via "nutch fetch". But some Chinese characters do not come out right after I dumped the segment back to html pages. For instance: http://www.dianping.com/shop/501079/ has title portion: <head><title> 韶山冲(徐汇店)(图)_上海_大众点评网 </title>
However, I got this after dumping: <head><title> 韶山��1¤7(徐汇庄1¤7)(��1¤7)_上海_大众点评罄1¤7 </title> The charset specified in the page is "UTF-8". As I includeded the following in "nutch-site.xml" <name>parser.character.encoding.default</name> <value>UTF-8</value> It makes no difference. What could be the problem? [image: 回复时引用此帖] <newreply.php?do=newreply&p=5869>
