Thanks Sergey!

Please feel free to add a page on the wiki:

http://wiki.apache.org/tika/

Describing your use case. I would appreciate it!
If you remember to sign up, tell me your username, or tell anyone
on this list (dev@tika), we’ll get you permissions and you can
create the page.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: Sergey Tsalkov <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Thursday, August 20, 2015 at 10:22 PM
To: "[email protected]" <[email protected]>
Subject: Re: want to disable tesseract ocr parser

>Thanks guys! Nick, your config file was exactly what I was looking
>for, though it took a minor tweak because you forgot to open the
>parser tag. I'm posting the corrected config below for anyone who
>refers to this thread in the future:
>
><?xml version="1.0" encoding="UTF-8"?>
><properties>
>  <parsers>
>    <parser class="org.apache.tika.parser.DefaultParser">
>      <parser-exclude
>class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
>    </parser>
>  </parsers>
></properties>
>
>On Thu, Aug 20, 2015 at 1:26 AM, Nick Burch <[email protected]> wrote:
>> On 20/08/15 07:19, Sergey Tsalkov wrote:
>>>
>>> Then I thought I could pass a custom config.xml to disable it, but I
>>> can't figure out how to write the config file.
>>
>>
>> See http://tika.apache.org/1.10/configuring.html#Configuring_Parsers for
>> details of the parser configuration
>>
>> You should be fine with a config file like:
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <properties>
>>   <parsers>
>>     <!-- Default Parser except no OCR -->
>>       <parser-exclude
>> class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
>>     </parser>
>>   </parsers>
>> </properties>
>>
>> Thanks
>> Nick

Reply via email to