:) a bit misleading.... first: Hadoop is the evolution from "Nutch Distributed File System".
It is based on google's file system. It enable one to keep all data in a distributed file system which is very suitable to Nutch. When you see bin/nuctch NDFS -ls write instead bin/hadoop dfs -ls now to create the seeds: create the urls.txt file in a folder called seeds i.e. seeds/urls.txt bin/hadoop dfs -put seeds seeds this will copy the seeds folder into hadoop file system and now bin/nutch crawl seeds -dir crawled -depth 3 >& crawl.log Happy crawling. Gal. On Wed, 2006-02-22 at 01:05 -0800, Foong Yie wrote: > matt > > as the tutorial stated .. > > bin/nutch crawl urls -dir crawled -depth 3 >& crawl.log > > the urls is in .txt right? i created it and put inside c:/nutch-0.7.1 > > Stephanie > > > --------------------------------- > Yahoo! Autos. Looking for a sweet ride? Get pricing, reviews, & more on new > and used cars. ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
