Thanks Markus. I will give this a try.  I did refilter the crawldb. One more
question:

I'm not good with regex. If I wanted to crawl

http://my.domain.name/dir/subdirA/subdirA1/
http://my.domain.name/dir/subdirB/subdirB1/
http://my.domain.name/dir/subdirB/subdirB2/
http://my.domain.name/dir/subdirC/subdirC1/

but not

http://my.domain.name/dir/subdirA/
http://my.domain.name/dir/subdirB/
http://my.domain.name/dir/subdirC/

Can I do that by modifying your suggestion or would I need to exclude each
URL individually?

I appreciate your help.

Best Regards,
ADS



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Prevent-crawl-of-parent-URL-tp4080032p4080111.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to