Hello everyone,
I have Nutch installed and running just fine. Nutch submits the crawl results to Solr for indexing. I need to have a separate field in Solr document that would hold raw HTML. At the moment, the "content" field holds the parsed text from the page only. >From what I read, it's impossible to do what I need without writing your own plugin. I don't know Java that well. What would be the easiest way to approach this task? Thank you in advance, Max