Hi group. I want to crawl a bunch of sites which have subdomains. I know I can filter external links (external with respect to the bunch of seeds) with the db.ignore.external.links option, but if I do that, Nutch ignores subdomain links. I know also that I can use url filtering with the regex-urlfilter.txt file, but in that case, I have to copy the seeds in the urlfilter, and if I want to crawl another site, I have to modify the urlfilter each time. Is there a transparent way (I mean, a way so that I don't have to modify the urlfilter each time I want to crawl another site) to ignore external links but without ignoring subdomain links?
Thanks

