:) a bit misleading.... first: Hadoop is the evolution from "Nutch Distributed File System".
It is based on google's file system. It enable one to keep all data in a distributed file system which is very suitable to Nutch. When you see bin/nuctch NDFS -ls write instead bin/hadoop dfs -ls now to create the seeds: create the urls.txt file in a folder called seeds i.e. seeds/urls.txt bin/hadoop dfs -put seeds seeds this will copy the seeds folder into hadoop file system and now bin/nutch crawl seeds -dir crawled -depth 3 >& crawl.log Happy crawling. Gal. On Wed, 2006-02-22 at 01:05 -0800, Foong Yie wrote: > matt > > as the tutorial stated .. > > bin/nutch crawl urls -dir crawled -depth 3 >& crawl.log > > the urls is in .txt right? i created it and put inside c:/nutch-0.7.1 > > Stephanie > > > --------------------------------- > Yahoo! Autos. Looking for a sweet ride? Get pricing, reviews, & more on new > and used cars.
