The parsedtext extracted from nutch commandline is UTF-8 by default (working for me on russian chars, for instance). Perhaps you refer to the text seen throught tomcat, in that case, you can fix it:
http://wiki.apache.org/nutch/GettingNutchRunningWithUtf8 Regards, Roman On Fri, Jul 11, 2008 at 3:37 PM, beansproud <[EMAIL PROTECTED]> wrote: > > Hi, > I'm crawl some chinese pages, and when I dump the parsetext, it displays > incorrectly as '?'. > Can anybody tell how to make it to be "utf-8" ? > > thanks! > -- > View this message in context: > http://www.nabble.com/how-to-get-the-parsetext-to-be-UTF-8---tp18404034p18404034.html > Sent from the Nutch - User mailing list archive at Nabble.com. > >
