RE: Limiting Crawl Time

2008-02-06 Thread Paul Stewart
-Original Message- From: Susam Pal [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 05, 2008 10:36 PM To: nutch-user@lucene.apache.org Subject: Re: Limiting Crawl Time Did you try specifying a topN value? -depth 3 -topN 1000 should be close to what you want. On 2/6/08, Paul Stewart [EMAIL PROTECTED

Re: Limiting Crawl Time

2008-02-06 Thread Susam Pal
the top 1000 URLs for this particular crawl. For the next crawl, again top 1000 URLs would be generated. Regards, Susam Pal -Original Message- From: Susam Pal [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 05, 2008 10:36 PM To: nutch-user@lucene.apache.org Subject: Re: Limiting Crawl

Limiting Crawl Time

2008-02-05 Thread Paul Stewart
Hi folks... What is the best way to say limit crawling to perhaps 3-4 hours per day? Is there a way to do this? Right now, I have a crawl depth of 6 and maximum per site of 100. I thought this would limit things pretty low but during some test crawls, my last crawl took 2.5 days to complete:

Re: Limiting Crawl Time

2008-02-05 Thread Susam Pal
Did you try specifying a topN value? -depth 3 -topN 1000 should be close to what you want. On 2/6/08, Paul Stewart [EMAIL PROTECTED] wrote: Hi folks... What is the best way to say limit crawling to perhaps 3-4 hours per day? Is there a way to do this? Right now, I have a crawl depth of 6