I'm not sure how to do this but i think creating an parse and indexing filter will do the trick. First you make the parse filter that reads the byte[] content from the Content object that is available in the parse filter. You then add the raw data in that parse filter to the parse data.
In your indexing filter you simply read that field and add it to the document. See writing plugin example on the wiki for basic introduction to writing plugins. On Wednesday 10 August 2011 14:12:13 Christopher Gross wrote: > I have Nutch 1.3 running, and have it connected to a Solr 3.3 > instance. Right now the data comes over from Nutch to Solr just fine, > but I'd like it to send the "content" field to Solr as the raw HTML, > so that I can have all the original markup to work with later. > > I've tried digging around on Google and I can't seem to find anything. > Can someone please push me in the right direction? > > Thanks! > > -- Christopher Gross -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

