The other vaguely related project that comes to mind is
https://www.pandoc.org/index.html but I don't know if that has hooks
to tesseract or a Rest API...  Sorry!

On Wed, Apr 24, 2019 at 10:08 AM Tim Allison <[email protected]> wrote:
>
> Maybe ?
>
> https://github.com/tleyden/open-ocr
>
>
>
> On Wed, Apr 24, 2019 at 9:58 AM Tim Allison <[email protected]> wrote:
>>
>> The goal of Tika is text and metadata extraction.  Our basic output is .txt, 
>> xhtml or json. We don’t currently support generation of other formats. Could 
>> you use DropWizard or similar to wrap tesseract it you need it to be restful?
>>
>> On Wed, Apr 24, 2019 at 8:21 AM Ralph Soika <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> I have a question about the Tesseract OCR Parser which is part of Tika:
>>> Is it possible to define the output of tesseract to PDF format. I think 
>>> tesseract supports this option to convert a image file (e.g. tif) into a 
>>> searchable pdf file:
>>>
>>> $ tesseract  --tessdata-dir ./ ./testing/eurotext.png 
>>> ./testing/eurotext-eng -l eng pdf
>>>
>>> I use the tika Rest API and I wonder how I can tell tell the Tika Server to 
>>> create a PDF output file?
>>>
>>>
>>> Thanks for any help
>>>
>>>
>>> Ralph
>>>
>>>

Reply via email to