Using solrmapping.xml you can map fields you don't want to an `ignored` field 
in 
Solr or map Solr's fields to a `ignored` fieldType.

On Thursday 18 November 2010 22:08:15 Guido wrote:
> Hi,
> 
> I want to index content from (selected) web sites to solr. I therefore
> want to extract the data from the document's DOM and put these
> information to corresponding fields of the index. In other words: I want
> to use nutch as a crawler and content extractor only.
> 
> I read that I would have to write a custom HtmlParseFilter.
> (http://wiki.apache.org/nutch/WritingPluginExample-0.9)
> This would add the extracted information to the parse object which can
> be accessed later when indexing.
> 
> So far, so good. But how to post my data to solr?
> 
> The article mentioned above suggests to write a custom "Indexer
> Extension". This extension would index new custom information
> additionally to the standard values. But note that I _don't_ want to
> index the standard outcome of nutch. (host, content, ...)
> 
> The SolrIndexer that comes along with nutch therefore seems to be
> impractical for my needs. I can't see how to convince it to not send the
> standard field values to the solr server. Beside that I actually would
> not need to use information from linkdb. But the need to provide a
> linkdb to SolrIndexer is hard coded.
> 
> Do I have to write a new Indexer from scratch?
> Is a custom HtmlParseFilter the right choice for my needs?
> Anything else I am not aware of?
> 
> Any hints how to get ahead are appreciated.
> 
> 
> Thanks
> 
> Guido
> 
> By the way:
> Where comes nutch/conf/schema.xml into play? I assume that it is just a
> template to replace solr/conf/schema.xml.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350

Reply via email to