I am using an hadoop cluster on the us-east-1 region. The strange thing is that if I run Nutch just on the hadoop-master instance with the jar (in the no-hadoop way) all work well about encoding. But If I use the job file on the hadoop way (in the same master instance with a slave of the same type) I start to have this problem on the encoding of special characters. And the same job file work well running on an single-node hadoop cluster installed on my pc. I really haven't any idea.. Thanks for all. Niccolò
On Wed, Aug 8, 2012 at 9:46 PM, X3C TECH <[email protected]> wrote: > Not sure if it matters, but what data center are you using? Maybe the data > center region uses different characters if the native language isn't > english > > On Wed, Aug 8, 2012 at 7:25 AM, Niccolò Becchi <[email protected] > >wrote: > > > Hi, > > I have been using Nutch for fetching english sites (UTF-8 and > ISO-8859-1). > > All go well running in local-mode or on a single-node hadoop cluster > > installed on my pc. > > Recently I have moved the crawling system to the Amazon AWS and Fetcher > has > > some encoding problems with special character, they are not recognizable > > (they appear as '?') > > I have tried both with EMR and cluster launched manually with the > > "hadoop-ec2 launch-cluster" command but it doesn't work well. > > The same page that are correctly fetched with my local hadoop cluster > have > > same encoding errors running on AWS (with exactly the same job) > > > > Any idea? > > Thanks! > > >

