YOu can also make a parse filter that copies the raw structure to another 
field and have it indexed later by an index filter.

On Sunday 25 March 2012 18:39:53 JohnRodey wrote:
> I am just doing a simple project for my Information Retrieval class.  I am
> currently using nutch to get a bunch of pages and it is indexing and
> storing the parsed page to SOLR.  What I really want to do is have it
> store the page source with HTML tags as well.  Is there an easy way to
> tell nutch to do that?
> 
> If not, after I have my pages indexed if I want to retrieve there original
> source from nutch what would be the command to do that?
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Out-of-the-box-Nutch-indexing-url-sourc
> e-to-Solr-tp3855918p3855918.html Sent from the Nutch - User mailing list
> archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex

Reply via email to