> Hello,
> 
> Is there a way where I can exclude certain url's from being fetched,
> or like programmatically altering their fetching frequecny?

Excluding URL's means not adding them to the crawl db or not allowing them to 
enter the fetch list when generating. Both jobs use the URL filters so it's 
easy to exclude URL patterns from entering the db or fetch list. It prefer to 
exclude URL's from entering the crawl db, this means i won't have to run URL 
filters (and normalizers in my case) when generating. Saves a lot of CPU time.

> 
> Also, how can one modify the crawldb? How do I kick out certain urls
> from crawldb.

This is nasty in Nutch 1.x. You need to exclude the URL as describe above and 
update the crawl db again.

> 
> Best Regards,
> C.B.

Reply via email to