Solr Cell is what you want to use here. It's a tika pipeline that you can configure to modify the data as you need.
Karl On Tue, Jul 24, 2012 at 11:35 AM, Arcadius Ahouansou <[email protected]> wrote: > > Hello. > > I am currently ManifoldCF 0.6 to crawl and index into Solr4. > > I need to extract data such as locations from the documents into a separate > field before I index into solr. > > - Is there a way this can be done with ManifoldCF? > - If not, is there an output connector allowing to store the content into an > database? Then I coud do the transformation on the DB before indexing. > > Thank you very much. > > Arcadius. > >
