use the -topN flag to only grab a small number of URLs.
Also I believe there is also a setting you can put in nutch-site.xml that
can be used to slow down how many URLs you grab over time.
Jesse
int GetRandomNumber()
{
return 4; // Chosen by fair roll of dice
// Guaranteed to be random
} // xkcd.com
On Fri, Dec 4, 2009 at 4:10 AM, Mr Hadoop <[email protected]> wrote:
> I am just staring to learn nutch. One question I wanted to know is that
> can
> nutch pause, stop and start indexing a site on a incremental daily basis?
> My concern with nutch is that nutch behaving like a hog and crawling
> everything with huge bandwidth consumption and pissing off the many site
> owners.
>
> Can some experts shed some light in this?
>