The goal of Tika is text and metadata extraction. Our basic output is .txt, xhtml or json. We don’t currently support generation of other formats. Could you use DropWizard or similar to wrap tesseract it you need it to be restful?
On Wed, Apr 24, 2019 at 8:21 AM Ralph Soika <[email protected]> wrote: > Hi, > > I have a question about the Tesseract OCR Parser which is part of Tika: > Is it possible to define the output of tesseract to PDF format. I think > tesseract supports this option to convert a image file (e.g. tif) into a > searchable pdf file: > > $ tesseract --tessdata-dir ./ ./testing/eurotext.png > ./testing/eurotext-eng -l eng pdf > > I use the tika Rest API and I wonder how I can tell tell the Tika Server > to create a PDF output file? > > > Thanks for any help > > > Ralph > >
