The other vaguely related project that comes to mind is https://www.pandoc.org/index.html but I don't know if that has hooks to tesseract or a Rest API... Sorry!
On Wed, Apr 24, 2019 at 10:08 AM Tim Allison <[email protected]> wrote: > > Maybe ? > > https://github.com/tleyden/open-ocr > > > > On Wed, Apr 24, 2019 at 9:58 AM Tim Allison <[email protected]> wrote: >> >> The goal of Tika is text and metadata extraction. Our basic output is .txt, >> xhtml or json. We don’t currently support generation of other formats. Could >> you use DropWizard or similar to wrap tesseract it you need it to be restful? >> >> On Wed, Apr 24, 2019 at 8:21 AM Ralph Soika <[email protected]> wrote: >>> >>> Hi, >>> >>> I have a question about the Tesseract OCR Parser which is part of Tika: >>> Is it possible to define the output of tesseract to PDF format. I think >>> tesseract supports this option to convert a image file (e.g. tif) into a >>> searchable pdf file: >>> >>> $ tesseract --tessdata-dir ./ ./testing/eurotext.png >>> ./testing/eurotext-eng -l eng pdf >>> >>> I use the tika Rest API and I wonder how I can tell tell the Tika Server to >>> create a PDF output file? >>> >>> >>> Thanks for any help >>> >>> >>> Ralph >>> >>>
