I am using tika with Apache Solr. What I need to achieve is to process all images with provided external parser instead of default image/jpeg parser. In general this is all about some external OCR software.
Now, Solr allows me to provide configuration file for Tika, but I could not find any example how such file may look. I have found config xml for external parsers inside Tika jars and could adapt it to suit my needs but how do I configure which mime types it should handle and force that it should be this parser, not the default parser for images. Also - can I configure both (external parser definition, mime associations, etc) in single external configuration file or do I have to put in Tika jars? I would like to try all possibilities with configuration before I will have to write my own parser. Any help appreciated.
