On Tue, Sep 30, 2008 at 11:52 PM, Kevin MacDonald <[EMAIL PROTECTED]> wrote:
> Does anyone have experience configuring Hadoop to use S3 for using nutch? I
> tried modifying my hadoop-site.xml configuration file and it looks like
> Hadoop is trying to use S3. But I think what's happening is that, once
> configured to use S3, Hadoop is ONLY looking at S3 for all files. It's
> trying to find a /tmp folder there, for example. And when running a crawl
> Hadoop is looking to S3 to find the seed urls folder. Are there steps that
> need to happen to prepare an S3 bucket for use by Hadoop so that a nutch
> crawl can happen?

If you want to pass paths from other filesystems I think you can do
something like:

bin/nutch inject crawl/crawldb hdfs://machine:10000/.....

> Kevin
>



-- 
Doğacan Güney

Reply via email to