Hi all. I have a problem in config nutch-default.xml. As I am in China, most ftp sites that I want to crawl are encoded in chinese, but when nutch crawl these ftp sites,it could not get the correct charset code,and the parse results are incomprehensible and useless. so I set <property> <name>parser.character.encoding.default</name> <value>windows-1252</value> </property> to <value>gb2312</value> and got a very interesting result, nutch now can crawl the files and directories of the root directoy of chinese ftp sites without any messy characters,but can NOT crawl any files in SUBdirectories,just got a result :404 no found. I know there must be something wrong in config files but how and where can I config nutch to crawl a chinese ftp site? I 've been working on this problem for halt a month and find no way to solve it, Could anyone helo me???
thanks
