Hi,

I'm going to crawl some set of news sites. Pages on those sites could be
divided into two types: category page and article page. I would like to
fetch categories pages more frequently than article pages. List of
categories is rather fixed so I could mark them manually.

I know I could reach similar behaviour using AdaptiveFetchSchedule but it
require some time to adjust fetch time. This doesn't satisfy me because
before the fetch I already know how often pages should be re crawled.

I wonder if it is possible in nutch to set different fetch intervals for
sites. I know that I could extend AbstractFetchSchedule and implement this
behaviour manually. This would require adding some extra field to WebPage
object which indicate what type of page we are dealing with. It is possible
to add such field to WebPage object? Maybe there is another approach?

Regards,
Mateusz

Reply via email to