Hi Rushikesh,

I don't have any experience with this specific plugin, but I have run across 
similar problems, with 2 possible reasons:
1. It is possible that this specific site does not properly declare what 
encoding it is using, and the browser guesses the correct one.
2. You may have run across https://issues.apache.org/jira/browse/NUTCH-1807. I 
solved a similar problem by setting the environment variable LC_ALL to 
en_US.UTF-8 for all Hadoop processes (more specifically, adding `export 
LC_ALL=en_US.UTF-8` in ~hadoop/.bashrc on all Hadoop machines solved the 
problem for me).

        Yossi.

> -----Original Message-----
> From: Rushi [mailto:rushikeshmod...@gmail.com]
> Sent: 25 January 2018 16:32
> To: user@nutch.apache.org; Mark Vega <veg...@uci.edu>
> Subject: Bayan Group Extractor plugin for Nutch-Spanish Accent Character Issue
> 
> Hello Everyone,
> I am having an issue while crawling the spanish website,some the accent
> characters are not converting properly.
> Here is an example  Infección (wrong one)should be Infección (correct ).
> 
> Note:This is with  *Bayan Group Extractor plugin.* Is there any change that i
> need to make to convert correctly.
> 
> --
> Regards
> Rushikesh M
> .Net Developer

Reply via email to