I am heavily using the new "pdf" option for ocr-ing single PDF pages (or their image equivalents), which works very well. Thanks for the new option in Tesseract svn trunk.
When inspecting the code I think found some pieces indicating a "multi-page" actions. - My question 1: Is Tesseract already supporting the OCR-ing of multi-page PDFs ? - My question 2: If answer is not: Are there initiatives to integrate this into Tesseract ? I would appreciate if Tesseract "pdf" works also for multi-page PDFs. Remark: This is how I process multi-page PDFs currently: At the moment I do have a script (using pdftk/PDFToolkit) to split a PDF into single image files, which I then convert one-by-one via Tesseract's "pdf" option, which single-page output I then have to collate again by another script into the final single mixed-mode output PDF file. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f85d93e3-ea49-47bc-aab9-5af9b4a268b1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

