thanks Markus ----- Original Message ----- From: "Markus Jelsma" <[email protected]> To: [email protected] Sent: Tuesday, October 4, 2016 7:01:57 AM Subject: RE: control order of operations
Hello - this is not Solr's maximum for a field at all. But it is Java's maximum for String. Just don't use string when indexing. Markus -----Original message----- > From:KRIS MUSSHORN <[email protected]> > Sent: Friday 30th September 2016 17:54 > To: [email protected] > Subject: Re: control order of operations > > would a better option be to use this property? > > indexer.max.content.length = 32765 > > ----- Original Message ----- > > From: "KRIS MUSSHORN" <[email protected]> > To: [email protected] > Sent: Friday, September 30, 2016 9:25:17 AM > Subject: control order of operations > > I've got nutch-site.xml set to http.content.limit = 32765 ( 1 short of solr > max ). > > I also have parser.html.whitelist set to ignore a bunch of irrelevant tags. > > Can I set nutch so that whitelist applies before truncation? > > Kris > >

