Dear Julien, Hi, Do you know any step by step guide for this procedure? Is this the same for nutch 1.8? Best regards.
On Wed, May 21, 2014 at 6:43 PM, Julien Nioche < [email protected]> wrote: > <property> > <name>db.fetch.interval.default</name> > <value>1800</value> > <description>The default number of seconds between re-fetches of a page > (30 days). > </description> > </property> > > means that a page which has already been fetched will be refetched again > after 30mins. This is what you want for the seeds but is also applied to > the subpages you've already discovered in previous rounds. > > What you could do would be to set a custom fetch interval for the seeds > only (see http://wiki.apache.org/nutch/bin/nutch%20inject for the use of > nutch.fetchInterval) and have a larger value for db.fetch.interval.default. > This way the seeds would be revisited frequently but not the subpages. Note > that this would work only if the links to the pages you want to discover > are directly in the seed files. If they are at a deeper level then they'd > be discovered only when the page that mentions them is re-fetched (== > nutch.fetchInterval) > > HTH > > Julien > > > On 21 May 2014 11:22, Ali rahmani <[email protected]> wrote: > > > Dear Sir, > > I am customizing Nutch 2.2 to crawl my seed lists which contains about 30 > > URL. I need to crawl mentioned URL every 24 minutes and JUST fetch new > > added links. I added the following configurations to nutch-site.xml file > > and use the following command: > > > > <property> > > <name>db.fetch.interval.default</name> > > <value>1800</value> > > <description>The default number of seconds between re-fetches of a page > > (30 days). > > </description> > > </property> > > > > <property> > > <name>db.update.purge.404</name> > > <value>true</value> > > <description>If true, updatedb will add purge records with status > DB_GONE > > from the CrawlDB. > > </description> > > </property> > > > > > > ./crawl urls/ testdb http://localhost:8983/solr 2 > > > > > > but whenever I run mention command, nutch goes deep and deeper. > > would you please tell where is the problem ? > > Regards, > > > > > -- > > Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble > -- A.Nazemian

