Someday Sent from my iPhone
> On Oct 1, 2016, at 2:11 AM, BlackIce <[email protected]> wrote: > > Then make your own :) > >> On Sep 30, 2016 11:13 PM, "Kris Musshorn" <[email protected]> wrote: >> >> Thanks blackice but I cant use a plug in that’s not been maintained in a >> year in my production environment >> >> -----Original Message----- >> From: BlackIce [mailto:[email protected]] >> Sent: Friday, September 30, 2016 2:42 PM >> To: [email protected] >> Subject: Re: control order of operations >> >> Try these, don't remember which I used and don't have access to my setup >> right now (there used to be a whitelist/blacklist plugin, but I don't seem >> to be able to find it on Google right now) >> >> https://github.com/BayanGroup/nutch-custom-search >> >> On Sep 30, 2016 7:35 PM, "KRIS MUSSHORN" <[email protected]> wrote: >> >> Ok basic knowledge deficit. >> >> Looks like parser.html.whitelist settings will not prevent sections of the >> html from being indexed into solr. >> >> How can i accomplish my goal of preventing header, footer and a few divs >> from being indexed into the content field of solr? >> >> Kris >> >> >> ----- Original Message ----- >> >> From: "KRIS MUSSHORN" <[email protected]> >> To: [email protected] >> Sent: Friday, September 30, 2016 11:54:40 AM >> Subject: Re: control order of operations >> >> would a better option be to use this property? >> >> indexer.max.content.length = 32765 >> >> ----- Original Message ----- >> >> From: "KRIS MUSSHORN" <[email protected]> >> To: [email protected] >> Sent: Friday, September 30, 2016 9:25:17 AM >> Subject: control order of operations >> >> I've got nutch-site.xml set to http.content.limit = 32765 ( 1 short of >> solr max ). >> >> I also have parser.html.whitelist set to ignore a bunch of irrelevant tags. >> >> Can I set nutch so that whitelist applies before truncation? >> >> Kris >> >>

