-Original Message-
From: Susam Pal [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 05, 2008 10:36 PM
To: nutch-user@lucene.apache.org
Subject: Re: Limiting Crawl Time
Did you try specifying a topN value? -depth 3 -topN 1000 should be
close to what you want.
On 2/6/08, Paul Stewart [EMAIL PROTECTED
the top 1000 URLs for this
particular crawl. For the next crawl, again top 1000 URLs would be
generated.
Regards,
Susam Pal
-Original Message-
From: Susam Pal [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 05, 2008 10:36 PM
To: nutch-user@lucene.apache.org
Subject: Re: Limiting Crawl
Hi folks...
What is the best way to say limit crawling to perhaps 3-4 hours per day?
Is there a way to do this?
Right now, I have a crawl depth of 6 and maximum per site of 100. I
thought this would limit things pretty low but during some test crawls,
my last crawl took 2.5 days to complete:
Did you try specifying a topN value? -depth 3 -topN 1000 should be
close to what you want.
On 2/6/08, Paul Stewart [EMAIL PROTECTED] wrote:
Hi folks...
What is the best way to say limit crawling to perhaps 3-4 hours per day?
Is there a way to do this?
Right now, I have a crawl depth of 6