Perms granted! :-) https://wiki.apache.org/tika/ContributorsGroup

—
Chris Mattmann
[email protected]






-----Original Message-----
From: Sergey Tsalkov <[email protected]>
Reply-To: <[email protected]>
Date: Thursday, August 20, 2015 at 10:31 PM
To: <[email protected]>
Subject: Re: want to disable tesseract ocr parser

>Happy to do that, Chris! I've created my account, username is
>SergeyTsalkov.
>
>On Thu, Aug 20, 2015 at 10:24 PM, Mattmann, Chris A (3980)
><[email protected]> wrote:
>> Thanks Sergey!
>>
>> Please feel free to add a page on the wiki:
>>
>> http://wiki.apache.org/tika/
>>
>> Describing your use case. I would appreciate it!
>> If you remember to sign up, tell me your username, or tell anyone
>> on this list (dev@tika), we’ll get you permissions and you can
>> create the page.
>>
>> Cheers,
>> Chris
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: [email protected]
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Sergey Tsalkov <[email protected]>
>> Reply-To: "[email protected]" <[email protected]>
>> Date: Thursday, August 20, 2015 at 10:22 PM
>> To: "[email protected]" <[email protected]>
>> Subject: Re: want to disable tesseract ocr parser
>>
>>>Thanks guys! Nick, your config file was exactly what I was looking
>>>for, though it took a minor tweak because you forgot to open the
>>>parser tag. I'm posting the corrected config below for anyone who
>>>refers to this thread in the future:
>>>
>>><?xml version="1.0" encoding="UTF-8"?>
>>><properties>
>>>  <parsers>
>>>    <parser class="org.apache.tika.parser.DefaultParser">
>>>      <parser-exclude
>>>class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
>>>    </parser>
>>>  </parsers>
>>></properties>
>>>
>>>On Thu, Aug 20, 2015 at 1:26 AM, Nick Burch <[email protected]> wrote:
>>>> On 20/08/15 07:19, Sergey Tsalkov wrote:
>>>>>
>>>>> Then I thought I could pass a custom config.xml to disable it, but I
>>>>> can't figure out how to write the config file.
>>>>
>>>>
>>>> See http://tika.apache.org/1.10/configuring.html#Configuring_Parsers
>>>>for
>>>> details of the parser configuration
>>>>
>>>> You should be fine with a config file like:
>>>>
>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>> <properties>
>>>>   <parsers>
>>>>     <!-- Default Parser except no OCR -->
>>>>       <parser-exclude
>>>> class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
>>>>     </parser>
>>>>   </parsers>
>>>> </properties>
>>>>
>>>> Thanks
>>>> Nick
>>


Reply via email to