I am using an hadoop cluster on the us-east-1 region.
The strange thing is that if I run Nutch just on the hadoop-master instance
with the jar (in the no-hadoop way) all work well about encoding.
But If I use the job file on the hadoop way (in the same master instance
with a slave of the same type) I start to have this problem on the encoding
of special characters.
And the same job file work well running on an single-node hadoop cluster
installed on my pc.
I really haven't any idea..
Thanks for all.
Niccolò

On Wed, Aug 8, 2012 at 9:46 PM, X3C TECH <[email protected]> wrote:

> Not sure if it matters, but what data center are you using? Maybe the data
> center region uses different characters if the native language isn't
> english
>
> On Wed, Aug 8, 2012 at 7:25 AM, Niccolò Becchi <[email protected]
> >wrote:
>
> > Hi,
> > I have been using Nutch for fetching english sites (UTF-8 and
> ISO-8859-1).
> > All go well running in local-mode or on a single-node hadoop cluster
> > installed on my pc.
> > Recently I have moved the crawling system to the Amazon AWS and Fetcher
> has
> > some encoding problems with special character, they are not recognizable
> > (they appear as '?')
> > I have tried both with EMR and cluster launched manually with the
> > "hadoop-ec2 launch-cluster" command but it doesn't work well.
> > The same page that are correctly fetched with my local hadoop cluster
> have
> > same encoding errors running on AWS (with exactly the same job)
> >
> > Any idea?
> > Thanks!
> >
>

Reply via email to