Hi Joe, In 1.x Markus and Julien IIRC committed a real nice patch a while back which allows you to achieve what I think you are after. Please look at this thread http://www.mail-archive.com/[email protected]/msg08738.html You will find piles of stuff on the user archive about this kinda granular stuff. ta, have a gd wkend.
On Friday, June 21, 2013, Joe Zhang <[email protected]> wrote: > Sorry, Nutch is certainly aware of page modification, and it does capture > lastModified. The real question is, can nutch get lastModified of a page > before fetching, and use it to make fetching decisions (e.g,, whether or > not to override the default interval)? > > > On Fri, Jun 21, 2013 at 6:27 PM, Joe Zhang <[email protected]> wrote: > >> If I don't change the default value of db.fetch.interval.default, which is >> 30 days, does it mean that the URL in the db won't be refetched before the >> due time even if it has been modified? In other words, is Nutch aware of >> page modification? >> > -- *Lewis*

