The parsedtext extracted from nutch commandline is UTF-8 by default
(working for me on russian chars, for instance). Perhaps you refer to
the text seen throught tomcat, in that case, you can fix it:

http://wiki.apache.org/nutch/GettingNutchRunningWithUtf8

Regards,
Roman

On Fri, Jul 11, 2008 at 3:37 PM, beansproud <[EMAIL PROTECTED]> wrote:
>
> Hi,
> I'm crawl some chinese pages, and when I dump the parsetext, it displays
> incorrectly as '?'.
> Can anybody tell how to make it to be "utf-8" ?
>
> thanks!
> --
> View this message in context: 
> http://www.nabble.com/how-to-get-the-parsetext-to-be-UTF-8---tp18404034p18404034.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>

Reply via email to