Hello,
Just wanted to bring to your notice that there is a slight error in
the NutchHadoopTutorial (http://wiki.apache.org/nutch/NutchHadoopTutorial).
The command given under "Performing a Nutch Crawl" is:


hadoop jar nutch-${version}.jar org.apache.nutch.crawl.Crawl urls -dir
urls -depth 3 -topN 5

It should be:


hadoop jar nutch-${version}.jar org.apache.nutch.crawl.Crawl urls -dir
crawl -depth 3 -topN 5

This is also in consistent with the immediate line which says:

"We are using the nutch crawl command. The urls dir is the urls directory
that we added to the distributed filesystem. The "-dir crawl" is the output
directory."

Regards,
Wahaj

Reply via email to