I would like to do a large crawl and let nutch run to index up to
10-100 million webpages. I know on
http://wiki.apache.org/nutch/NutchTutorial the nutch crawl command
will do all steps with just that command, but the page calls it
intranet crawling. Also the page say the crawl command have
limitations, but doesn't tell what they are.
My questions are can I use the crawl command for indexing 10-100
millions of pages from many different sites? Also what are the
limitations of the crawl command?

Reply via email to