Hi all, I'm a new user of Nutch. I use Nutch primarily to crawl blog and news sites. But I noticed that Nutch fetches pages only on some refresh interval (30 days default).
Blog and news sites have unique characteristic that some of their pages are updated very frequently (e.g. the main page) so they have to be refetched often, while other pages don't need to be refreshed / refetched at all (e.g. the news article pages, which eventually will become 'obsolete'). Is there any way to force update some URLs? Can I just 're-inject' the URLs to set the next fetch date to 'immediately'? Thank you, -- Arie Karhendana
