Dear Sir, 
I am customizing Nutch 2.2 to crawl my seed lists which contains about 30 URL. 
I need to crawl mentioned URL every 24 minutes and JUST fetch new added links. 
I added the following configurations to nutch-site.xml file and use the 
following command:

<property>
  <name>db.fetch.interval.default</name>
  <value>1800</value>
  <description>The default number of seconds between re-fetches of a page (30 
days).
  </description>
</property>

<property>
  <name>db.update.purge.404</name>
  <value>true</value>
  <description>If true, updatedb will add purge records with status DB_GONE
  from the CrawlDB.
  </description>
</property>


./crawl urls/ testdb http://localhost:8983/solr 2


but whenever I run mention command, nutch goes deep and deeper.
would you please tell where is the problem ?
Regards,

Reply via email to