On 12/12/2016 12:19 PM, KRIS MUSSHORN wrote:
> I'm using nutch 1.12 and Solr 5.4.1. 
>   
> Crawling a website and indexing into nutch. 
>   
> AFAIK the regex-urlfilter.txt file will cause content to not be crawled.. 
>   
> what if I have 
> https://XXXX/inside/default.cfm  as my seed url... 
> I want the links on this page to be crawled and indexed but I do not want 
> this page to be indexed into SOLR. 
> How would I set this up? 
>   
> I'm thnking that the regex.urlfilter.txt file is NOT the right place. 

These sound like questions about how to configure Nutch.  This is a Solr
mailing list.  Nutch is a completely separate Apache product with its
own mailing list.  Although there may be people here who do use Nutch,
it's not the purpose of this list.  Please use support resources for Nutch.

http://nutch.apache.org/mailing_lists.html

I'm reasonably certain that this cannot be controlled by Solr's
configuration.  Solr will index anything that is sent to it, so the
choice of what to send or not send in this situation will be decided by
Nutch.

Thanks,
Shawn

Reply via email to