Thank you, this should be workable. I originally wanted to avoid having to specify a new config because I thought that supplying my own tika XML config meant that I had to redefine everything that would be in the default file. However after some testing it appears that, as in your example, the default tika config is simply modified with the single exclusion that is provided in the custom config?
Brian On Mon, Sep 21, 2015 at 1:22 PM, Nick Burch <[email protected]> wrote: > On Mon, 21 Sep 2015, Brian Young wrote: > >> Hello, we are long time Tika users that have recently started using >> Tesseract. We would like to be able to enable/disable Tesseract per >> extraction with Tesseract disabled until we choose to enable it. >> > > The easiest way would be to have two different TikaConfig objects, and > pick between them (+their parsers) at runtime > > Have your with-Tesseract one just be the default config if you want > > Have your no-Tesseract one be created with a config file along the lines of > > <properties> > <parsers> > <parser class="org.apache.tika.parser.DefaultParser"> > <parser-exclude > class="org.apache.tika.parser.ocr.TesseractOCRParser"/> > </parser> > </parsers> > </properties> > > Thanks > Nick >
