Hi All, I want to restrict a crawl to a domain specified in a input url. I used the *db.ignore.external.links* property(set to true), but I found that links that are redirected outside the input url also got crawled. However if I set the regex-urlfilter.txt and crawl-urlfilter.txt files, I was able to avoid these extra urls and crawled more urls from the seed domain. I expected that both these approaches should give same results. Is it a bug?
Is there a way to fix this issue without setting urlfilters? With changing filter files I need to edit them before crawling each domin and also need to restart nutch. Is there a way I can change these filter values at runtime ? Thanks, Neera