Re: Why aren't my path exclusions getting excluded in the Nutch index to Solr?

dogrdon Fri, 19 Jul 2013 12:34:02 -0700

Hi Lewis, thanks for a quick reply, but I actually don't understand this:

as far as I can tell,


+^http://www.oursite.com/([a-z0-9\-A-Z]*\/)* in the regex-urlfilter.txt
means that it will crawl all pages under that main domain, which is what I
want. 

If i set it to -^http://www.oursite.com/([a-z0-9\-A-Z]*\/)*, it crawls
nothing and says no URLs to fetch.

How is it that I *can* crawl my whole site, with the exception of skipping
over a few paths.

sorry if my confusion is confusing :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-aren-t-my-path-exclusions-getting-excluded-in-the-Nutch-index-to-Solr-tp4079172p4079205.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Why aren't my path exclusions getting excluded in the Nutch index to Solr?

Reply via email to