On Fri, Jan 20, 2017 at 07:19:13PM +0100, Ingo Feinerer wrote: > Hi, > > please find attached a port for pdfsandwich, > a tool to make "sandwich" OCR pdf files. > > $ cat pkg/DESCR > pdfsandwich generates "sandwich" OCR pdf files, i.e. pdf files which contain > only images (no text) will be processed by optical character recognition (OCR) > and the text will be added to each page invisibly "behind" the images. > > pdfsandwich is a command line tool which is supposed to be useful to OCR > scanned books or journals. It is able to recognize the page layout even for > multicolumn text. > > OK to import? > > Best regards, > Ingo
Hi, I haven't tested pdfsandwich but I have WIP port for ocrmypdf, at least python is more readable for me than ocalm :) https://github.com/jbarlow83/OCRmyPDF j.