When I fetch and parse and solrindex a segment I get the html-stripped
version of the page. 

I looked at the dump of segment and noticed that nutch has the full html of
the document under the 
"Content::" heading and it has the html-stripped version of the page in
"ParseText::" after it's run through NekoHTML parser.

I would like to know if I can configure nutch to solrindex the "Content::"
part of the record rather than "ParseText::" part.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrindex-Content-instead-of-ParseText-tp4123822.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to