Hello,
Just wanted to bring to your notice that there is a slight error in
the NutchHadoopTutorial (http://wiki.apache.org/nutch/NutchHadoopTutorial).
The command given under "Performing a Nutch Crawl" is:
hadoop jar nutch-${version}.jar org.apache.nutch.crawl.Crawl urls -dir
urls -depth 3 -topN 5
It should be:
hadoop jar nutch-${version}.jar org.apache.nutch.crawl.Crawl urls -dir
crawl -depth 3 -topN 5
This is also in consistent with the immediate line which says:
"We are using the nutch crawl command. The urls dir is the urls directory
that we added to the distributed filesystem. The "-dir crawl" is the output
directory."
Regards,
Wahaj