Re: AW: nutch-0.8 crawl problem

Gal Nitzan Wed, 22 Feb 2006 02:12:07 -0800

:) a bit misleading....

first: Hadoop is the evolution from "Nutch Distributed File System".


It is based on google's file system. It enable one to keep all data in a
distributed file system which is very suitable to Nutch.

When you see bin/nuctch NDFS -ls write instead bin/hadoop dfs -ls

now to create the seeds:

create the urls.txt file in a folder called seeds i.e. seeds/urls.txt

bin/hadoop dfs -put seeds seeds
this will copy the seeds folder into hadoop file system

and now

bin/nutch crawl seeds -dir crawled -depth 3 >& crawl.log

Happy crawling.

Gal.


On Wed, 2006-02-22 at 01:05 -0800, Foong Yie wrote:
> matt
> 
> as the tutorial stated ..
> 
> bin/nutch crawl urls -dir crawled -depth 3 >& crawl.log
> 
> the urls is in .txt right? i created it and put inside c:/nutch-0.7.1
> 
> Stephanie
> 
>               
> ---------------------------------
>  Yahoo! Autos. Looking for a sweet ride? Get pricing, reviews, & more on new 
> and used cars.

Re: AW: nutch-0.8 crawl problem

Reply via email to