Re: [tesseract-ocr] Re: Poor translations on first attempts

Robert Komar Thu, 10 Apr 2014 11:23:50 -0700

On Thu, 10 Apr 2014, Joel Wheeler wrote:

Thank you for your suggestion! I had read that suggestion
in the docs prior to my attempts but didn't believe that
taking the same image and simply bumping up the dpi on it
would fix the translation errors. It seems like this would
be something that tesseract would do on it's own in the
image pre-processing phase.


It's a bit of a chicken and egg problem.  To know that
the resolution is too small, tesseract has to first
recognize the text size correctly.  If the characters
are too small, then the recognition can be bad, and the
estimated resolution wrong.  There may also be multiple
text sizes within an image, so which is used for rescaling?
I think automatic rescaling is too error prone, so it's
not even attempted.  It's better left to the application
writer, who may have a better idea of what the text sizes
might be.

Also, font size, dpi,... are in themselves not the
important numbers.  Within the image, the characters
should fall within a given pixel range.  A 12 point
font in a 300 dpi image is fine.  A 120 point font in
a 30 dpi image is fine.  A 12 point font in a 100dpi
image isn't.  So, just looking at the font size, or
just the dpi, doesn't tell you if you're in the
right range.

Cheers,
Rob Komar

--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

---You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: Poor translations on first attempts

Reply via email to