If the last url has not fixed the problem, you can contribute to a
similar (this same?) issue on JIRA:

http://issues.apache.org/jira/browse/NUTCH-540

On Sun, Jul 13, 2008 at 8:35 PM, brainstorm <[EMAIL PROTECTED]> wrote:
> The parsedtext extracted from nutch commandline is UTF-8 by default
> (working for me on russian chars, for instance). Perhaps you refer to
> the text seen throught tomcat, in that case, you can fix it:
>
> http://wiki.apache.org/nutch/GettingNutchRunningWithUtf8
>
> Regards,
> Roman
>
> On Fri, Jul 11, 2008 at 3:37 PM, beansproud <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>> I'm crawl some chinese pages, and when I dump the parsetext, it displays
>> incorrectly as '?'.
>> Can anybody tell how to make it to be "utf-8" ?
>>
>> thanks!
>> --
>> View this message in context: 
>> http://www.nabble.com/how-to-get-the-parsetext-to-be-UTF-8---tp18404034p18404034.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
>

Reply via email to