I'm using nutch 1.12 and Solr 5.4.1. 
  
Crawling a website and indexing into nutch. 
  
AFAIK the regex-urlfilter.txt file will cause content to not be crawled.. 
  
what if I have 
https://XXXX/inside/default.cfm  as my seed url... 
I want the links on this page to be crawled and indexed but I do not want this 
page to be indexed into SOLR. 
How would I set this up? 
  
I'm thnking that the regex.urlfilter.txt file is NOT the right place. 
  
TIA 
Kris 

Reply via email to