The goal of Tika is text and metadata extraction.  Our basic output is
.txt, xhtml or json. We don’t currently support generation of other
formats. Could you use DropWizard or similar to wrap tesseract it you need
it to be restful?

On Wed, Apr 24, 2019 at 8:21 AM Ralph Soika <[email protected]> wrote:

> Hi,
>
> I have a question about the Tesseract OCR Parser which is part of Tika:
> Is it possible to define the output of tesseract to PDF format. I think
> tesseract supports this option to convert a image file (e.g. tif) into a
> searchable pdf file:
>
> $ tesseract  --tessdata-dir ./ ./testing/eurotext.png
> ./testing/eurotext-eng -l eng pdf
>
> I use the tika Rest API and I wonder how I can tell tell the Tika Server
> to create a PDF output file?
>
>
> Thanks for any help
>
>
> Ralph
>
>

Reply via email to