not sure, havent tried it yet. Trying to get my hadoop tutorials out the door first before Fri. Can chat then.
I think some have tried it. The bixo dude has experience doing this. humm. not sure what is the cause of this is. If I had to guess there are 2 ways to run nutch, using EC2+EBS or EMR. problem with EMR is when the job stops the data goes poof into thin air. So they transfer data to S3. And bug is in there. Not sure which layer the bug is in, the hadoop file system interface into s3 or from nutch. we can debug this on fri if you like. dc On Tue, Aug 23, 2011 at 6:03 PM, Peter Harrington < [email protected]> wrote: > Does anyone use Nutch on EMR? > I am using Nutch 1.3 and I get an error saying: > > FATAL org.apache.nutch.crawl.Generator (main): Generator: > java.lang.IllegalArgumentException: This file system object > (hdfs://ip-44-169-41-187.ec2.internal:9000) does not support access to the > request path 's3://Datasets/crawlResults/crawldb/.locked' You possibly > called FileSystem.get(conf) when you should have called FileSystem.get(uri, > conf) to obtain a file system supporting your path. > > I have seen other posts with this same problem but no resolution. Does > anyone use Nutch-1.3 on EMR? > > Thanks for the help, > Peter >

