It's the parsing and manipulation of PDF that scares me. Thanks for pointing out PDFBox, it looks pretty amazing. It even has the program that I was speculating about.
https://pdfbox.apache.org/1.8/commandline.html#overlayPDF -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3d050ff5-6abc-4b30-87e2-4f704218d5f1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.