sorry my mistake.. sent to wrong list. 
  
----- Original Message -----

From: "Shawn Heisey" <apa...@elyograg.org> 
To: solr-user@lucene.apache.org 
Sent: Monday, December 12, 2016 2:36:26 PM 
Subject: Re: regex-urlfilter help 

On 12/12/2016 12:19 PM, KRIS MUSSHORN wrote: 
> I'm using nutch 1.12 and Solr 5.4.1. 
>   
> Crawling a website and indexing into nutch. 
>   
> AFAIK the regex-urlfilter.txt file will cause content to not be crawled.. 
>   
> what if I have 
> https://XXXX/inside/default.cfm  as my seed url... 
> I want the links on this page to be crawled and indexed but I do not want 
> this page to be indexed into SOLR. 
> How would I set this up? 
>   
> I'm thnking that the regex.urlfilter.txt file is NOT the right place. 

These sound like questions about how to configure Nutch.  This is a Solr 
mailing list.  Nutch is a completely separate Apache product with its 
own mailing list.  Although there may be people here who do use Nutch, 
it's not the purpose of this list.  Please use support resources for Nutch. 

http://nutch.apache.org/mailing_lists.html 

I'm reasonably certain that this cannot be controlled by Solr's 
configuration.  Solr will index anything that is sent to it, so the 
choice of what to send or not send in this situation will be decided by 
Nutch. 

Thanks, 
Shawn 


Reply via email to