Re: Forcing update of some URLs

Briggs Thu, 19 Apr 2007 14:56:11 -0700

From what I have gathered is that you may want to keep multiple

crawldbs for your crawls.  So, you could have a crawldb for more
frequent crawls and fire off nutch and read that db with the
appropriate configs for that job.   I was hoping for the same
mechanism, but it looks like we need to write this for ourselves.



On 4/12/07, Arie Karhendana <[EMAIL PROTECTED]> wrote:

Hi all,

I'm a new user of Nutch. I use Nutch primarily to crawl blog and news
sites. But I noticed that Nutch fetches pages only on some refresh
interval (30 days default).

Blog and news sites have unique characteristic that some of their
pages are updated very frequently (e.g. the main page) so they have to
be refetched often, while other pages don't need to be refreshed /
refetched at all (e.g. the news article pages, which eventually will
become 'obsolete').

Is there any way to force update some URLs? Can I just 're-inject' the
URLs to set the next fetch date to 'immediately'?

Thank you,
--
Arie Karhendana



--
"Conscious decisions by concious minds are what make reality real"

Re: Forcing update of some URLs

Reply via email to