Does anyone have experience configuring Hadoop to use S3 for using nutch? I
tried modifying my hadoop-site.xml configuration file and it looks like
Hadoop is trying to use S3. But I think what's happening is that, once
configured to use S3, Hadoop is ONLY looking at S3 for all files. It's
trying to find a /tmp folder there, for example. And when running a crawl
Hadoop is looking to S3 to find the seed urls folder. Are there steps that
need to happen to prepare an S3 bucket for use by Hadoop so that a nutch
crawl can happen?
Kevin