Re: regex-urlfilter help

2016-12-18 Thread forest_soup
Yeah,, I'm curious why this thread is used to talk that topic. I'll start a new thread on my questions. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-cannot-provide-index-service-after-a-large-GC-pause-but-core-state-in-ZK-is-still-active-tp4308942p4310302.html Sent

Re: regex-urlfilter help

2016-12-12 Thread KRIS MUSSHORN
sorry my mistake.. sent to wrong list.   - Original Message - From: "Shawn Heisey" To: solr-user@lucene.apache.org Sent: Monday, December 12, 2016 2:36:26 PM Subject: Re: regex-urlfilter help On 12/12/2016 12:19 PM, KRIS MUSSHORN wrote: > I'm using nutch 1

Re: regex-urlfilter help

2016-12-12 Thread Shawn Heisey
On 12/12/2016 12:19 PM, KRIS MUSSHORN wrote: > I'm using nutch 1.12 and Solr 5.4.1. > > Crawling a website and indexing into nutch. > > AFAIK the regex-urlfilter.txt file will cause content to not be crawled.. > > what if I have > https:///inside/default.cfm as my seed url... > I

regex-urlfilter help

2016-12-12 Thread KRIS MUSSHORN
I'm using nutch 1.12 and Solr 5.4.1.   Crawling a website and indexing into nutch.   AFAIK the regex-urlfilter.txt file will cause content to not be crawled..   what if I have https:///inside/default.cfm  as my seed url... I want the links on this page to be crawled and indexed but I do