I would like to do a large crawl and let nutch run to index up to
10-100 million webpages. I know on
http://wiki.apache.org/nutch/NutchTutorial the nutch crawl command
will do all steps with just that command, but the page calls it
intranet crawling. Also the page say the crawl command have
I would not recommend using the Crawl command for large crawls, because:
1. Tuning Hadoop ist not possible at all
2. Incremental Crawling is also pretty difficult because you can't control
the different processes/steps
On Sat, Feb 26, 2011 at 9:58 AM, firespin firespin...@gmail.com wrote:
I
wish on a single workstation.
Lewis
From: Hannes Carl Meyer [hannesc...@googlemail.com]
Sent: 26 February 2011 18:02
To: user@nutch.apache.org
Subject: Re: Can I use the Nutch crawl command for large crawls?
I would not recommend using the Crawl command
3 matches
Mail list logo