I would like to do a large crawl and let nutch run to index up to 10-100 million webpages. I know on http://wiki.apache.org/nutch/NutchTutorial the nutch crawl command will do all steps with just that command, but the page calls it intranet crawling. Also the page say the crawl command have limitations, but doesn't tell what they are. My questions are can I use the crawl command for indexing 10-100 millions of pages from many different sites? Also what are the limitations of the crawl command?
- Can I use the Nutch crawl command for large crawls? firespin
- Re: Can I use the Nutch crawl command for large... Hannes Carl Meyer
- RE: Can I use the Nutch crawl command for l... McGibbney, Lewis John

