Dear Julien,
Hi,
Do you know any step by step guide for this procedure? Is this the same for
nutch 1.8?
Best regards.


On Wed, May 21, 2014 at 6:43 PM, Julien Nioche <
[email protected]> wrote:

> <property>
>   <name>db.fetch.interval.default</name>
>   <value>1800</value>
>   <description>The default number of seconds between re-fetches of a page
> (30 days).
>   </description>
> </property>
>
> means that a page which has already been fetched will be refetched again
> after 30mins. This is what you want for the seeds but is also applied to
> the subpages you've already discovered in previous rounds.
>
> What you could do would be to set a custom fetch interval for the seeds
> only (see http://wiki.apache.org/nutch/bin/nutch%20inject for the use of
> nutch.fetchInterval) and have a larger value for db.fetch.interval.default.
> This way the seeds would be revisited frequently but not the subpages. Note
> that this would work only if the links to the pages you want to discover
> are directly in the seed files. If they are at a deeper level then they'd
> be discovered only when the page that mentions them is re-fetched (==
> nutch.fetchInterval)
>
> HTH
>
> Julien
>
>
> On 21 May 2014 11:22, Ali rahmani <[email protected]> wrote:
>
> > Dear Sir,
> > I am customizing Nutch 2.2 to crawl my seed lists which contains about 30
> > URL. I need to crawl mentioned URL every 24 minutes and JUST fetch new
> > added links. I added the following configurations to nutch-site.xml file
> > and use the following command:
> >
> > <property>
> >   <name>db.fetch.interval.default</name>
> >   <value>1800</value>
> >   <description>The default number of seconds between re-fetches of a page
> > (30 days).
> >   </description>
> > </property>
> >
> > <property>
> >   <name>db.update.purge.404</name>
> >   <value>true</value>
> >   <description>If true, updatedb will add purge records with status
> DB_GONE
> >   from the CrawlDB.
> >   </description>
> > </property>
> >
> >
> > ./crawl urls/ testdb http://localhost:8983/solr 2
> >
> >
> > but whenever I run mention command, nutch goes deep and deeper.
> > would you please tell where is the problem ?
> > Regards,
>
>
>
>
> --
>
> Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>



-- 
A.Nazemian

Reply via email to