Thanks guys! Nick, your config file was exactly what I was looking
for, though it took a minor tweak because you forgot to open the
parser tag. I'm posting the corrected config below for anyone who
refers to this thread in the future:

<?xml version="1.0" encoding="UTF-8"?>
<properties>
  <parsers>
    <parser class="org.apache.tika.parser.DefaultParser">
      <parser-exclude class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
    </parser>
  </parsers>
</properties>

On Thu, Aug 20, 2015 at 1:26 AM, Nick Burch <[email protected]> wrote:
> On 20/08/15 07:19, Sergey Tsalkov wrote:
>>
>> Then I thought I could pass a custom config.xml to disable it, but I
>> can't figure out how to write the config file.
>
>
> See http://tika.apache.org/1.10/configuring.html#Configuring_Parsers for
> details of the parser configuration
>
> You should be fine with a config file like:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <properties>
>   <parsers>
>     <!-- Default Parser except no OCR -->
>       <parser-exclude
> class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
>     </parser>
>   </parsers>
> </properties>
>
> Thanks
> Nick

Reply via email to