Thanks for that hint which answers my original question. For even better performance, I would prefer set CrawlDatum's fetchinterval depending on the parsed contents of say an XML feed file: if the last entries are temporally close together I want a shorter fetchinterval than if they lie apart. Where would the right place be to set that?
Cheers, Chris On Wed, Oct 6, 2010 at 10:14 AM, reinhard schwab <[email protected]> wrote: > implement your own schedule class and set the property in the nutch-site.xml > in nutch-default.xml you have > > <property> > <name>db.fetch.schedule.class</name> > <value>org.apache.nutch.crawl.DefaultFetchSchedule</value> > <description>The implementation of fetch schedule. > DefaultFetchSchedule simply > adds the original fetchInterval to the last fetch time, regardless of > page changes.</description> > </property> > > you can see in this class how to implement your own schedule class. > > Christopher Laux schrieb: >> Hi all, >> >> thanks for the last answer. I have a more advanced question, if you don't >> mind: >> >> What is the easiest way to make revisit times depend on the http/html >> content-type, e.g. I want to revisit "application/rss+xml" pages every >> 12 hours but "text/html" etc. can remain at 30 days? >> >> Do I have to modify the generate and update functions or could plugins >> handle this? >> >> Thanks, >> Chris >> >> > >

