On 20/08/15 07:19, Sergey Tsalkov wrote:
Then I thought I could pass a custom config.xml to disable it, but I
can't figure out how to write the config file.

See http://tika.apache.org/1.10/configuring.html#Configuring_Parsers for details of the parser configuration

You should be fine with a config file like:

<?xml version="1.0" encoding="UTF-8"?>
<properties>
  <parsers>
    <!-- Default Parser except no OCR -->
<parser-exclude class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
    </parser>
  </parsers>
</properties>

Thanks
Nick

Reply via email to