On Tue, Sep 30, 2008 at 11:52 PM, Kevin MacDonald <[EMAIL PROTECTED]> wrote: > Does anyone have experience configuring Hadoop to use S3 for using nutch? I > tried modifying my hadoop-site.xml configuration file and it looks like > Hadoop is trying to use S3. But I think what's happening is that, once > configured to use S3, Hadoop is ONLY looking at S3 for all files. It's > trying to find a /tmp folder there, for example. And when running a crawl > Hadoop is looking to S3 to find the seed urls folder. Are there steps that > need to happen to prepare an S3 bucket for use by Hadoop so that a nutch > crawl can happen?
If you want to pass paths from other filesystems I think you can do something like: bin/nutch inject crawl/crawldb hdfs://machine:10000/..... > Kevin > -- Doğacan Güney
