implement your own schedule class and set the property in the nutch-site.xml
in nutch-default.xml you have

<property>
  <name>db.fetch.schedule.class</name>
  <value>org.apache.nutch.crawl.DefaultFetchSchedule</value>
  <description>The implementation of fetch schedule.
DefaultFetchSchedule simply
  adds the original fetchInterval to the last fetch time, regardless of
  page changes.</description>
</property>

you can see in this class how to implement your own schedule class.

Christopher Laux schrieb:
> Hi all,
>
> thanks for the last answer. I have a more advanced question, if you don't 
> mind:
>
> What is the easiest way to make revisit times depend on the http/html
> content-type, e.g. I want to revisit "application/rss+xml" pages every
> 12 hours but "text/html" etc. can remain at 30 days?
>
> Do I have to modify the generate and update functions or could plugins
> handle this?
>
> Thanks,
> Chris
>
>   

Reply via email to