Hi, Can someone please explain how the fetcher behaves with respect to modified/unmodified content, in the current trunk version?
My requirement is basically this - I have one page (seed url) which has links to other urls. The links in this page, keeps getting changed on a daily basis. I want nutch to keep refetching this page, as it changes regularly, but not refetch the outlinks on this page since they are more or less static. I have set both "db.fetch.interval.default" and "db.fetch.interval.max" to a high value of apprx 1 year and am using the DefaultFetchSchedule class. Does this imply that even for pages which have been modified, the next fetch would be after an year? Or do I need to use the AdaptiveFetchSchedule? I would be really thankful if someone could help me with my fetcher settings. Regards, Chris
