When I fetch and parse and solrindex a segment I get the html-stripped version of the page.
I looked at the dump of segment and noticed that nutch has the full html of the document under the "Content::" heading and it has the html-stripped version of the page in "ParseText::" after it's run through NekoHTML parser. I would like to know if I can configure nutch to solrindex the "Content::" part of the record rather than "ParseText::" part. -- View this message in context: http://lucene.472066.n3.nabble.com/solrindex-Content-instead-of-ParseText-tp4123822.html Sent from the Nutch - User mailing list archive at Nabble.com.

