I am just doing a simple project for my Information Retrieval class. I am currently using nutch to get a bunch of pages and it is indexing and storing the parsed page to SOLR. What I really want to do is have it store the page source with HTML tags as well. Is there an easy way to tell nutch to do that?
If not, after I have my pages indexed if I want to retrieve there original source from nutch what would be the command to do that? -- View this message in context: http://lucene.472066.n3.nabble.com/Out-of-the-box-Nutch-indexing-url-source-to-Solr-tp3855918p3855918.html Sent from the Nutch - User mailing list archive at Nabble.com.

