Can I use the Nutch crawl command for large crawls?

2011-02-26 Thread firespin
I would like to do a large crawl and let nutch run to index up to 10-100 million webpages. I know on http://wiki.apache.org/nutch/NutchTutorial the nutch crawl command will do all steps with just that command, but the page calls it intranet crawling. Also the page say the crawl command have

Re: Can I use the Nutch crawl command for large crawls?

2011-02-26 Thread Hannes Carl Meyer
I would not recommend using the Crawl command for large crawls, because: 1. Tuning Hadoop ist not possible at all 2. Incremental Crawling is also pretty difficult because you can't control the different processes/steps On Sat, Feb 26, 2011 at 9:58 AM, firespin firespin...@gmail.com wrote: I

RE: Can I use the Nutch crawl command for large crawls?

2011-02-26 Thread McGibbney, Lewis John
wish on a single workstation. Lewis From: Hannes Carl Meyer [hannesc...@googlemail.com] Sent: 26 February 2011 18:02 To: user@nutch.apache.org Subject: Re: Can I use the Nutch crawl command for large crawls? I would not recommend using the Crawl command