When I scan a document for ocr, tesseract requires that the image is high dpi. However I do not require such a high dpi in my target PDF file and using such a high dpi in my final PDF files seems like a waste of disk space since I do not need the same resolution image in order to read it as tesseact does. Therefore I am imagining a scenario where I increase the resolution of the original image for tesseract to do ocr on, but subsequently apply the hocr information to the original (lower resolution) image. I however cannot seem to find a way to accomplish this, as the hocr information references the image size with the increased resolution so the text in the image and in the hocr data no longer are aligned. It would be great to be able to apply the hocr layer properly fitted to the original document. Is there some way to do this that I have missed, or do you think this would be a useful addition to the program?
Sincerely, Proctor -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/02dbd11c-910a-4f8f-9b3f-d75e66228899%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

