Re: regex-urlfilter help

2016-12-18 Thread forest_soup
Yeah,, I'm curious why this thread is used to talk that topic.
I'll start a new thread on my questions. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-cannot-provide-index-service-after-a-large-GC-pause-but-core-state-in-ZK-is-still-active-tp4308942p4310302.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: regex-urlfilter help

2016-12-12 Thread KRIS MUSSHORN

sorry my mistake.. sent to wrong list. 
  
- Original Message -

From: "Shawn Heisey" <apa...@elyograg.org> 
To: solr-user@lucene.apache.org 
Sent: Monday, December 12, 2016 2:36:26 PM 
Subject: Re: regex-urlfilter help 

On 12/12/2016 12:19 PM, KRIS MUSSHORN wrote: 
> I'm using nutch 1.12 and Solr 5.4.1. 
>   
> Crawling a website and indexing into nutch. 
>   
> AFAIK the regex-urlfilter.txt file will cause content to not be crawled.. 
>   
> what if I have 
> https:///inside/default.cfm  as my seed url... 
> I want the links on this page to be crawled and indexed but I do not want 
> this page to be indexed into SOLR. 
> How would I set this up? 
>   
> I'm thnking that the regex.urlfilter.txt file is NOT the right place. 

These sound like questions about how to configure Nutch.  This is a Solr 
mailing list.  Nutch is a completely separate Apache product with its 
own mailing list.  Although there may be people here who do use Nutch, 
it's not the purpose of this list.  Please use support resources for Nutch. 

http://nutch.apache.org/mailing_lists.html 

I'm reasonably certain that this cannot be controlled by Solr's 
configuration.  Solr will index anything that is sent to it, so the 
choice of what to send or not send in this situation will be decided by 
Nutch. 

Thanks, 
Shawn 




Re: regex-urlfilter help

2016-12-12 Thread Shawn Heisey
On 12/12/2016 12:19 PM, KRIS MUSSHORN wrote:
> I'm using nutch 1.12 and Solr 5.4.1. 
>   
> Crawling a website and indexing into nutch. 
>   
> AFAIK the regex-urlfilter.txt file will cause content to not be crawled.. 
>   
> what if I have 
> https:///inside/default.cfm  as my seed url... 
> I want the links on this page to be crawled and indexed but I do not want 
> this page to be indexed into SOLR. 
> How would I set this up? 
>   
> I'm thnking that the regex.urlfilter.txt file is NOT the right place. 

These sound like questions about how to configure Nutch.  This is a Solr
mailing list.  Nutch is a completely separate Apache product with its
own mailing list.  Although there may be people here who do use Nutch,
it's not the purpose of this list.  Please use support resources for Nutch.

http://nutch.apache.org/mailing_lists.html

I'm reasonably certain that this cannot be controlled by Solr's
configuration.  Solr will index anything that is sent to it, so the
choice of what to send or not send in this situation will be decided by
Nutch.

Thanks,
Shawn



regex-urlfilter help

2016-12-12 Thread KRIS MUSSHORN
I'm using nutch 1.12 and Solr 5.4.1. 
  
Crawling a website and indexing into nutch. 
  
AFAIK the regex-urlfilter.txt file will cause content to not be crawled.. 
  
what if I have 
https:///inside/default.cfm  as my seed url... 
I want the links on this page to be crawled and indexed but I do not want this 
page to be indexed into SOLR. 
How would I set this up? 
  
I'm thnking that the regex.urlfilter.txt file is NOT the right place. 
  
TIA 
Kris