If the last url has not fixed the problem, you can contribute to a similar (this same?) issue on JIRA:
http://issues.apache.org/jira/browse/NUTCH-540 On Sun, Jul 13, 2008 at 8:35 PM, brainstorm <[EMAIL PROTECTED]> wrote: > The parsedtext extracted from nutch commandline is UTF-8 by default > (working for me on russian chars, for instance). Perhaps you refer to > the text seen throught tomcat, in that case, you can fix it: > > http://wiki.apache.org/nutch/GettingNutchRunningWithUtf8 > > Regards, > Roman > > On Fri, Jul 11, 2008 at 3:37 PM, beansproud <[EMAIL PROTECTED]> wrote: >> >> Hi, >> I'm crawl some chinese pages, and when I dump the parsetext, it displays >> incorrectly as '?'. >> Can anybody tell how to make it to be "utf-8" ? >> >> thanks! >> -- >> View this message in context: >> http://www.nabble.com/how-to-get-the-parsetext-to-be-UTF-8---tp18404034p18404034.html >> Sent from the Nutch - User mailing list archive at Nabble.com. >> >> >
