I'm using nutch 1.12 and Solr 5.4.1. Crawling a website and indexing into nutch. AFAIK the regex-urlfilter.txt file will cause content to not be crawled.. what if I have https://XXXX/inside/default.cfm as my seed url... I want the links on this page to be crawled and indexed but I do not want this page to be indexed into SOLR. How would I set this up? I'm thnking that the regex.urlfilter.txt file is NOT the right place. TIA Kris
- Solr cannot provide index service after a large GC pause bu... forest_soup
- prefix query help KRIS MUSSHORN
- Re: prefix query help Erick Erickson
- Re: prefix query help KRIS MUSSHORN
- Re: prefix query help Shawn Heisey
- Re: prefix query help Erik Hatcher
- Re: prefix query help KRIS MUSSHORN
- Re: prefix query help Erik Hatcher
- RE: prefix query help Kris Musshorn
- regex-urlfilter help KRIS MUSSHORN
- Re: regex-urlfilter help Shawn Heisey
- Re: regex-urlfilter help KRIS MUSSHORN
- Re: regex-urlfilter help forest_soup