So in crawling and indexing our site to Solr via Nutch, we need to be able to exclude any content that falls under a certain path.
So say we have our site: http://oursite.com/ and we have a path that we don't want to index at http://oursite.com/private/ I have http://oursite.com/ in the seed.txt file and +^http://www.oursite.com/([a-z0-9\-A-Z]*\/)* in the regex-urlfilter.txt file I thought that putting: -.*/private/.* also in the regex-urlfilter.txt file would exclude that path and anything under it, but the crawler is still fetching and indexing content under the /private/ path. Is there some kind of restart I need to do on the server, like Solr? Or is my regex not actually the right way to do this? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Why-aren-t-my-path-exclusions-getting-excluded-in-the-Nutch-index-to-Solr-tp4079172.html Sent from the Nutch - User mailing list archive at Nabble.com.

