[tesseract-ocr] Re: pdf -> searchable PDF

2017-01-15 Thread wikinaut
Andreas, we track your issue now as new issue https://github.com/tesseract-ocr/tesseract/issues/660 . Please don't miss to follow the discussion there. It looks, as if the main developers are really interested in finding and implementing a solution (in which I am also very interested in.) --

Re: [tesseract-ocr] Re: pdf -> searchable PDF

2017-01-13 Thread ShreeDevi Kumar
Please see https://github.com/tesseract-ocr/tesseract/issues/83 and other PDF related issues in GitHub repo with similar discussion. - excuse the brevity, sent from mobile On 13-Jan-2017 10:15 PM, "James R Barlow" wrote: > Tesseract cannot rasterize PDFs. It is fairly

[tesseract-ocr] Re: pdf -> searchable PDF

2017-01-13 Thread James R Barlow
Tesseract cannot rasterize PDFs. It is fairly straightforward to write a PDF like does, but very complex to rasterize one. Programs like OCRmyPDF (which I develop) use Ghostscript, Tesseract and other tools to handle PDF to searchable PDF conversion. On Tuesday, January 10, 2017 at 9:34:57 PM