I am using tika with Apache Solr. What I need to achieve is to process all
images with provided external parser instead of default image/jpeg parser.
In general this is all about some external OCR software.

Now, Solr allows me to provide configuration file for Tika, but I could not
find any example how such file may look.

I have found config xml for external parsers inside Tika jars and could
adapt it to suit my needs but how do I configure which mime types it should
handle and force that it should be this parser, not the default parser for
images. Also - can I configure both (external parser definition, mime
associations, etc) in single external configuration file or do I have to put
in Tika jars?

 

I would like to try all possibilities with configuration before I will have
to write my own parser.

 

Any help appreciated.

 

Reply via email to