Solr Cell is what you want to use here.  It's a tika pipeline that you
can configure to modify the data as you need.

Karl

On Tue, Jul 24, 2012 at 11:35 AM, Arcadius Ahouansou
<[email protected]> wrote:
>
> Hello.
>
> I am currently ManifoldCF 0.6 to crawl and index into Solr4.
>
> I need to extract data such as locations from the documents into a separate
> field before I index  into solr.
>
> - Is there a way this can be done with ManifoldCF?
> - If not, is there an output connector allowing to store the content into an
> database? Then I coud do the transformation on the DB before indexing.
>
> Thank you very much.
>
> Arcadius.
>
>

Reply via email to