hi

i am using nutch 1.2. in my crawl-urlfilter.txt, i am specifying URLs to be
skipped. i am giving some patterns that need to be skipped but it is not
working

e.g.

-^http://([a-z0-9]*\.)*domain.com
+^http://([a-z0-9]*\.)*domain.com/([0-9-a-z])*.html
-^http://([a-z0-9]*\.)*domain.com/([a-z/])*
-^http://([a-z0-9]*\.)*domain.com/top-ads.php

i want the second URL only to be included while crawling & all other
patterns to be excluded. but it is crawling all of them. Please suggest
where might be the issue

thanks
Pawan

Reply via email to