I've got nutch-site.xml set to http.content.limit = 32765 ( 1 short of solr max 
). 

I also have parser.html.whitelist set to ignore a bunch of irrelevant tags. 

Can I set nutch so that whitelist applies before truncation? 

Kris 

Reply via email to