On 20/08/15 07:19, Sergey Tsalkov wrote:
Then I thought I could pass a custom config.xml to disable it, but I
can't figure out how to write the config file.
See http://tika.apache.org/1.10/configuring.html#Configuring_Parsers for
details of the parser configuration
You should be fine with a config file like:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<!-- Default Parser except no OCR -->
<parser-exclude
class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
</parser>
</parsers>
</properties>
Thanks
Nick