I would like to crawl everything in

http://my.domain.name/dir/subdir

but nothing in its parent

http://my.domain.name/dir/

In regex-urlfilter.txt I have the following:

# skip URLs
-^http://my.domain.name/dir/

# accept URLs
+^http://my.domain.name/dir/subdir/*

but Nutch still crawls the skip URLs. Any suggestions how to correct this
behavior?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Prevent-crawl-of-parent-URL-tp4080032.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to