Re: Tika-Server - Tesseract - Output to PDF

Tim Allison Wed, 24 Apr 2019 07:09:39 -0700

Maybe ?

https://github.com/tleyden/open-ocr




On Wed, Apr 24, 2019 at 9:58 AM Tim Allison <[email protected]> wrote:

> The goal of Tika is text and metadata extraction.  Our basic output is
> .txt, xhtml or json. We don’t currently support generation of other
> formats. Could you use DropWizard or similar to wrap tesseract it you need
> it to be restful?
>
> On Wed, Apr 24, 2019 at 8:21 AM Ralph Soika <[email protected]> wrote:
>
>> Hi,
>>
>> I have a question about the Tesseract OCR Parser which is part of Tika:
>> Is it possible to define the output of tesseract to PDF format. I think
>> tesseract supports this option to convert a image file (e.g. tif) into a
>> searchable pdf file:
>>
>> $ tesseract  --tessdata-dir ./ ./testing/eurotext.png
>> ./testing/eurotext-eng -l eng pdf
>>
>> I use the tika Rest API and I wonder how I can tell tell the Tika Server
>> to create a PDF output file?
>>
>>
>> Thanks for any help
>>
>>
>> Ralph
>>
>>

Re: Tika-Server - Tesseract - Output to PDF

Reply via email to