Thanks, that is my question. If I want to make a html snapshot, how should I do? Modify the SolrIndexer and IndexerMapReduce ?
2011/12/28 Marek Bachmann <[email protected]> > Hey ho, > > I think the questions was why only the PARSED content is in the content > field. > > As I have understood Cube wants to have the raw page content to be > stored and / or indexed. > > Cube, for what will you need the raw content? It is possible to add it > to solr, even to index it in the content field. But I am not sure if it > makes sense because I don't know what you want to do. :) > > Am 28.12.2011 15:35, schrieb Markus Jelsma: > > check your solr schema, its likely set not to store. > > > >> When I use sorlindex command post the crawled content. I can find the > >> content field that is Parsed text. Why not have the raw content field? > >

