Hi Ali, It is the same problem that I faced recently. It is my concert too. I would appreciate if somebody answer this question. Best regards.
On Wed, May 21, 2014 at 2:52 PM, Ali rahmani <[email protected]> wrote: > Dear Sir, > I am customizing Nutch 2.2 to crawl my seed lists which contains about 30 > URL. I need to crawl mentioned URL every 24 minutes and JUST fetch new > added links. I added the following configurations to nutch-site.xml file > and use the following command: > > <property> > <name>db.fetch.interval.default</name> > <value>1800</value> > <description>The default number of seconds between re-fetches of a page > (30 days). > </description> > </property> > > <property> > <name>db.update.purge.404</name> > <value>true</value> > <description>If true, updatedb will add purge records with status DB_GONE > from the CrawlDB. > </description> > </property> > > > ./crawl urls/ testdb http://localhost:8983/solr 2 > > > but whenever I run mention command, nutch goes deep and deeper. > would you please tell where is the problem ? > Regards, -- A.Nazemian

