On Mon, 21 Sep 2015, Brian Young wrote:
Hello, we are long time Tika users that have recently started using Tesseract. We would like to be able to enable/disable Tesseract per extraction with Tesseract disabled until we choose to enable it.

The easiest way would be to have two different TikaConfig objects, and pick between them (+their parsers) at runtime

Have your with-Tesseract one just be the default config if you want

Have your no-Tesseract one be created with a config file along the lines of

<properties>
  <parsers>
    <parser class="org.apache.tika.parser.DefaultParser">
      <parser-exclude class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
    </parser>
  </parsers>
</properties>

Thanks
Nick

Reply via email to