Happy to do that, Chris! I've created my account, username is SergeyTsalkov.

On Thu, Aug 20, 2015 at 10:24 PM, Mattmann, Chris A (3980)
<[email protected]> wrote:
> Thanks Sergey!
>
> Please feel free to add a page on the wiki:
>
> http://wiki.apache.org/tika/
>
> Describing your use case. I would appreciate it!
> If you remember to sign up, tell me your username, or tell anyone
> on this list (dev@tika), we’ll get you permissions and you can
> create the page.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: [email protected]
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
> -----Original Message-----
> From: Sergey Tsalkov <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Thursday, August 20, 2015 at 10:22 PM
> To: "[email protected]" <[email protected]>
> Subject: Re: want to disable tesseract ocr parser
>
>>Thanks guys! Nick, your config file was exactly what I was looking
>>for, though it took a minor tweak because you forgot to open the
>>parser tag. I'm posting the corrected config below for anyone who
>>refers to this thread in the future:
>>
>><?xml version="1.0" encoding="UTF-8"?>
>><properties>
>>  <parsers>
>>    <parser class="org.apache.tika.parser.DefaultParser">
>>      <parser-exclude
>>class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
>>    </parser>
>>  </parsers>
>></properties>
>>
>>On Thu, Aug 20, 2015 at 1:26 AM, Nick Burch <[email protected]> wrote:
>>> On 20/08/15 07:19, Sergey Tsalkov wrote:
>>>>
>>>> Then I thought I could pass a custom config.xml to disable it, but I
>>>> can't figure out how to write the config file.
>>>
>>>
>>> See http://tika.apache.org/1.10/configuring.html#Configuring_Parsers for
>>> details of the parser configuration
>>>
>>> You should be fine with a config file like:
>>>
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <properties>
>>>   <parsers>
>>>     <!-- Default Parser except no OCR -->
>>>       <parser-exclude
>>> class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
>>>     </parser>
>>>   </parsers>
>>> </properties>
>>>
>>> Thanks
>>> Nick
>

Reply via email to