Happy to do that, Chris! I've created my account, username is SergeyTsalkov.
On Thu, Aug 20, 2015 at 10:24 PM, Mattmann, Chris A (3980) <[email protected]> wrote: > Thanks Sergey! > > Please feel free to add a page on the wiki: > > http://wiki.apache.org/tika/ > > Describing your use case. I would appreciate it! > If you remember to sign up, tell me your username, or tell anyone > on this list (dev@tika), we’ll get you permissions and you can > create the page. > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > -----Original Message----- > From: Sergey Tsalkov <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Thursday, August 20, 2015 at 10:22 PM > To: "[email protected]" <[email protected]> > Subject: Re: want to disable tesseract ocr parser > >>Thanks guys! Nick, your config file was exactly what I was looking >>for, though it took a minor tweak because you forgot to open the >>parser tag. I'm posting the corrected config below for anyone who >>refers to this thread in the future: >> >><?xml version="1.0" encoding="UTF-8"?> >><properties> >> <parsers> >> <parser class="org.apache.tika.parser.DefaultParser"> >> <parser-exclude >>class="org.apache.tika.parser.ocr.TesseractOCRParser"/> >> </parser> >> </parsers> >></properties> >> >>On Thu, Aug 20, 2015 at 1:26 AM, Nick Burch <[email protected]> wrote: >>> On 20/08/15 07:19, Sergey Tsalkov wrote: >>>> >>>> Then I thought I could pass a custom config.xml to disable it, but I >>>> can't figure out how to write the config file. >>> >>> >>> See http://tika.apache.org/1.10/configuring.html#Configuring_Parsers for >>> details of the parser configuration >>> >>> You should be fine with a config file like: >>> >>> <?xml version="1.0" encoding="UTF-8"?> >>> <properties> >>> <parsers> >>> <!-- Default Parser except no OCR --> >>> <parser-exclude >>> class="org.apache.tika.parser.ocr.TesseractOCRParser"/> >>> </parser> >>> </parsers> >>> </properties> >>> >>> Thanks >>> Nick >
