thanks for your reply. the website I try to fetch is a social bbs which can produce tens of thousands of new threads and replies. when a new reply is added to a thread the page may changed. In my case I need to re-fetch the changed page and new page in every hours(each hour if possible).
2011/5/24 McGibbney, Lewis John <[email protected]> > What is the nature of webpages that are changing? twitter feeds, news > streams? > > Do you have any indication of how frequently they are changing? If you > think it is very frequent then I would suggest setting Nutch up as a cron > job. If you are indexing to Solr, using dedup and clean coomands as well as > setting specific properties in nutch-site would allow you to maintain a > pretty healthy representation of the web graph this way. > > Lewis > > ________________________________________ > From: Bupo Jung [[email protected]] > Sent: 24 May 2011 16:31 > To: [email protected] > Subject: How to re-fetch all the modified page? > > Hi, > In my case, the webpage content may modified frequently,and I want to > re-fetch the modified pages as soon as possible. > I have read the nutch wiki about Intranet recrawl. It only consider the > db.fetch.interval property to decide weather to re-fetch the page. > How can I do? > Any idea? > > thanks. > bupo.jung > -- > > Email has been scanned for viruses by Altman Technologies' email management > service - www.altman.co.uk/emailsystems > > Glasgow Caledonian University is a registered Scottish charity, number > SC021474 > > Winner: Times Higher Education’s Widening Participation Initiative of the > Year 2009 and Herald Society’s Education Initiative of the Year 2009. > > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html > > Winner: Times Higher Education’s Outstanding Support for Early Career > Researchers of the Year 2010, GCU as a lead with Universities Scotland > partners. > > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html > -- Yizhong Zhuang Beijing University of Posts and Telecommunications Email:[email protected] Myblog:www.mikkoo.info

