thanks for your reply.
the website I try to fetch is a social bbs which can produce tens of
thousands of new threads and replies. when a new reply is added to a thread
the page may changed.
In my case I need to re-fetch the changed page and new page in every
hours(each hour if possible).

2011/5/24 McGibbney, Lewis John <[email protected]>

> What is the nature of webpages that are changing? twitter feeds, news
> streams?
>
> Do you have any indication of how frequently they are changing? If you
> think it is very frequent then I would suggest setting Nutch up as a cron
> job. If you are indexing to Solr, using dedup and clean coomands as well as
> setting specific properties in nutch-site would allow you to maintain a
> pretty healthy representation of the web graph this way.
>
> Lewis
>
> ________________________________________
> From: Bupo Jung [[email protected]]
> Sent: 24 May 2011 16:31
> To: [email protected]
> Subject: How to re-fetch all the modified page?
>
> Hi,
> In my case, the webpage content may modified frequently,and I want to
> re-fetch the modified pages as soon as possible.
> I have read the nutch wiki about Intranet recrawl. It only consider the
> db.fetch.interval property to decide weather to re-fetch the page.
> How can I do?
> Any idea?
>
> thanks.
> bupo.jung
> --
>
> Email has been scanned for viruses by Altman Technologies' email management
> service - www.altman.co.uk/emailsystems
>
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
>
> Winner: Times Higher Education’s Widening Participation Initiative of the
> Year 2009 and Herald Society’s Education Initiative of the Year 2009.
>
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
>
> Winner: Times Higher Education’s Outstanding Support for Early Career
> Researchers of the Year 2010, GCU as a lead with Universities Scotland
> partners.
>
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
>



-- 

Yizhong Zhuang
Beijing University of Posts and Telecommunications
Email:[email protected]
Myblog:www.mikkoo.info

Reply via email to