Hi all,

I'm a new user of Nutch. I use Nutch primarily to crawl blog and news
sites. But I noticed that Nutch fetches pages only on some refresh
interval (30 days default).

Blog and news sites have unique characteristic that some of their
pages are updated very frequently (e.g. the main page) so they have to
be refetched often, while other pages don't need to be refreshed /
refetched at all (e.g. the news article pages, which eventually will
become 'obsolete').

Is there any way to force update some URLs? Can I just 're-inject' the
URLs to set the next fetch date to 'immediately'?

Thank you,
-- 
Arie Karhendana

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to