[tesseract-ocr] hocr and resizing image

Proctor MacBelle Fri, 15 Aug 2014 15:55:49 -0700

When I scan a document for ocr, tesseract requires that the image is high 
dpi. However I do not require such a high dpi in my target PDF file and 
using such a high dpi in my final PDF files seems like a waste of disk 
space since I do not need the same resolution image in order to read it as 
tesseact does. Therefore I am imagining a scenario where I increase the 
resolution of the original image for tesseract to do ocr on, but 
subsequently apply the hocr information to the original (lower resolution) 
image. I however cannot seem to find a way to accomplish this, as the hocr 
information references the image size with the increased resolution so the 
text in the image and in the hocr data no longer are aligned.
It would be great to be able to apply the hocr layer properly fitted to the 
original document. Is there some way to do this that I have missed, or do 
you think this would be a useful addition to the program?


Sincerely,
Proctor

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/02dbd11c-910a-4f8f-9b3f-d75e66228899%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] hocr and resizing image

Reply via email to