One more question.. I'm using nutch-0.8.0 and trying to index a domain and want to exclude a certain directory from the crawl. In the crawl-urlfilter.txt I have defined the following:

# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*wwwapps.mywebsite.com*/
-^http://([a-z0-9]*\.)*wwwapps.mywebsite.com*/yummy

However, the /yummy directory is still crawled. Any ideas as to what is going on? Thanks..
Matt

Reply via email to