I am heavily using the new "pdf" option for ocr-ing single PDF pages (or 
their image equivalents), which works very well. Thanks for the new option 
in Tesseract svn trunk.

When inspecting the code I think found some pieces indicating a 
"multi-page" actions. 

   - My question 1: Is Tesseract already supporting the OCR-ing of 
   multi-page PDFs ?
   - My question 2: If answer is not: Are there initiatives to integrate 
   this into Tesseract ?

I would appreciate if Tesseract "pdf" works also for multi-page PDFs.


Remark:

This is how I process multi-page PDFs currently:

At the moment I do have a script (using pdftk/PDFToolkit) to split a PDF 
into single image files, which I then convert one-by-one via Tesseract's 
"pdf" option, which single-page output I then have to collate again by 
another script into the final single mixed-mode output PDF file. 


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f85d93e3-ea49-47bc-aab9-5af9b4a268b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to