Re: Re-crawl every 24 hours

Ali Nazemian Wed, 21 May 2014 03:26:29 -0700

Hi Ali,
It is the same problem that I faced recently. It is my concert too. I would
appreciate if somebody answer this question.
Best regards.



On Wed, May 21, 2014 at 2:52 PM, Ali rahmani <[email protected]> wrote:

> Dear Sir,
> I am customizing Nutch 2.2 to crawl my seed lists which contains about 30
> URL. I need to crawl mentioned URL every 24 minutes and JUST fetch new
> added links. I added the following configurations to nutch-site.xml file
> and use the following command:
>
> <property>
>   <name>db.fetch.interval.default</name>
>   <value>1800</value>
>   <description>The default number of seconds between re-fetches of a page
> (30 days).
>   </description>
> </property>
>
> <property>
>   <name>db.update.purge.404</name>
>   <value>true</value>
>   <description>If true, updatedb will add purge records with status DB_GONE
>   from the CrawlDB.
>   </description>
> </property>
>
>
> ./crawl urls/ testdb http://localhost:8983/solr 2
>
>
> but whenever I run mention command, nutch goes deep and deeper.
> would you please tell where is the problem ?
> Regards,




-- 
A.Nazemian

Re: Re-crawl every 24 hours

Reply via email to