Hi Lewis, thanks for a quick reply, but I actually don't understand this: as far as I can tell,
+^http://www.oursite.com/([a-z0-9\-A-Z]*\/)* in the regex-urlfilter.txt means that it will crawl all pages under that main domain, which is what I want. If i set it to -^http://www.oursite.com/([a-z0-9\-A-Z]*\/)*, it crawls nothing and says no URLs to fetch. How is it that I *can* crawl my whole site, with the exception of skipping over a few paths. sorry if my confusion is confusing :) -- View this message in context: http://lucene.472066.n3.nabble.com/Why-aren-t-my-path-exclusions-getting-excluded-in-the-Nutch-index-to-Solr-tp4079172p4079205.html Sent from the Nutch - User mailing list archive at Nabble.com.

