Maybe ? https://github.com/tleyden/open-ocr
On Wed, Apr 24, 2019 at 9:58 AM Tim Allison <[email protected]> wrote: > The goal of Tika is text and metadata extraction. Our basic output is > .txt, xhtml or json. We don’t currently support generation of other > formats. Could you use DropWizard or similar to wrap tesseract it you need > it to be restful? > > On Wed, Apr 24, 2019 at 8:21 AM Ralph Soika <[email protected]> wrote: > >> Hi, >> >> I have a question about the Tesseract OCR Parser which is part of Tika: >> Is it possible to define the output of tesseract to PDF format. I think >> tesseract supports this option to convert a image file (e.g. tif) into a >> searchable pdf file: >> >> $ tesseract --tessdata-dir ./ ./testing/eurotext.png >> ./testing/eurotext-eng -l eng pdf >> >> I use the tika Rest API and I wonder how I can tell tell the Tika Server >> to create a PDF output file? >> >> >> Thanks for any help >> >> >> Ralph >> >>
