Hi Neera, try fetcher2 instead of fetcher. In my experience, the fetcher2 implementations considers the "db.ignore.external.links" setting even for redirects, so that you don't need url-filters to limit a crawl to a certain domain.
Kind regards, Martina -----Ursprüngliche Nachricht----- Von: Neera Sharma [mailto:neera.sha...@gmail.com] Gesendet: Freitag, 20. März 2009 23:51 An: nutch-user@lucene.apache.org Betreff: db.ignore.external.links and urlfilters Hi All, I want to restrict a crawl to a domain specified in a input url. I used the *db.ignore.external.links* property(set to true), but I found that links that are redirected outside the input url also got crawled. However if I set the regex-urlfilter.txt and crawl-urlfilter.txt files, I was able to avoid these extra urls and crawled more urls from the seed domain. I expected that both these approaches should give same results. Is it a bug? Is there a way to fix this issue without setting urlfilters? With changing filter files I need to edit them before crawling each domin and also need to restart nutch. Is there a way I can change these filter values at runtime ? Thanks, Neera