Re: Tika-Server - Tesseract - Output to PDF

AJ Weber Wed, 24 Apr 2019 10:58:17 -0700

Noticed this a while back...have not had time to test it out...https://github.com/jbarlow83/OCRmyPDF


On 4/24/2019 12:15 PM, Tim Allison wrote:

The other vaguely related project that comes to mind is
https://www.pandoc.org/index.html but I don't know if that has hooks
to tesseract or a Rest API...  Sorry!

On Wed, Apr 24, 2019 at 10:08 AM Tim Allison <[email protected]> wrote:

Maybe ?

https://github.com/tleyden/open-ocr



On Wed, Apr 24, 2019 at 9:58 AM Tim Allison <[email protected]> wrote:

The goal of Tika is text and metadata extraction.  Our basic output is .txt, 
xhtml or json. We don’t currently support generation of other formats. Could 
you use DropWizard or similar to wrap tesseract it you need it to be restful?

On Wed, Apr 24, 2019 at 8:21 AM Ralph Soika <[email protected]> wrote:

Hi,

I have a question about the Tesseract OCR Parser which is part of Tika:
Is it possible to define the output of tesseract to PDF format. I think 
tesseract supports this option to convert a image file (e.g. tif) into a 
searchable pdf file:

$ tesseract  --tessdata-dir ./ ./testing/eurotext.png ./testing/eurotext-eng -l 
eng pdf

I use the tika Rest API and I wonder how I can tell tell the Tika Server to 
create a PDF output file?


Thanks for any help


Ralph

Re: Tika-Server - Tesseract - Output to PDF

Reply via email to