Hello, Thanks for the reply, but this doesn't seem to work either. I removed the crawl dir, added the regex you posted, removed the one I had in regex-urlfilter.txt and crawl-urlfilter.txt and restarted the crawl. My crawls spend about 90% of their time on who.int .. I have no idea how to remove this domain or all .int domains from being crawled. Do I have the regex in the wrong conf file?
Thanks, -Warren reinhard schwab wrote: > > opsec schrieb: >> I've added this to my conf/crawl-urlfilter.txt and >> conf/regex-urlfilter.txt >> yet when I start a crawl this domain is heavily spidered. I would like to >> remove it from my search results entirely and prevent it from being >> crawled >> in the future and possibly all *.int tlds, how can i accomplish this? >> >> -^http://([a-z0-9]*\.)*who.int/ >> > why not > > -^http://[^/]*\.int/ > > > >> Thanks for your time and any assistance, >> >> -Warren >> > > > -- View this message in context: http://old.nabble.com/How-do-I-block-ban-a-specific-domain-name-or-a-tld--tp26289091p26306461.html Sent from the Nutch - User mailing list archive at Nabble.com.