Hi,
I have been using Nutch for fetching english sites (UTF-8 and ISO-8859-1).
All go well running in local-mode or on a single-node hadoop cluster
installed on my pc.
Recently I have moved the crawling system to the Amazon AWS and Fetcher has
some encoding problems with special character, they are not recognizable
(they appear as '?')
I have tried both with EMR and cluster launched manually with the
"hadoop-ec2 launch-cluster" command but it doesn't work well.
The same page that are correctly fetched with my local hadoop cluster have
same encoding errors running on AWS (with exactly the same job)

Any idea?
Thanks!

Reply via email to