I'm using nutch 1.12 and Solr 5.4.1. Crawling a website and indexing into nutch. AFAIK the regex-urlfilter.txt file will cause content to not be crawled.. what if I have https://XXXX/inside/default.cfm as my seed url... I want the links on this page to be crawled and indexed but I do not want this page to be indexed into SOLR. How would I set this up? I'm thnking that the regex.urlfilter.txt file is NOT the right place.
- config help KRIS MUSSHORN
- Re: config help Sebastian Nagel
- Re: config help KRIS MUSSHORN

