OCR Problems (unicharambigs and image sizes)

Alan Willard Thu, 01 Sep 2011 19:49:39 -0700

Hello All,
I have a OCR scenario where we are trying to OCR text from screen
images. I have a trained language that includes the one specific font
in use.


I have noticed a couple of strange issues.

1.) unicharambigs and dictionary seems to have no effect. For example
a very common error I see is the character 'a' being interpreted as an
'e'. This is despite having a line in unicharambigs that tries to
resolve the ambiguity, AND the original word is a dictionary word, and
the result is not. Example: art -> ert

2.) The size of the image seems to greatly influence the quality of
OCR. Not only the size, but the location of the text within that
image. My OCR scenarios are really simple, black text on a white
background, no other noise (like a standard text field). I will get
different OCR results based on the amount of white space around the
text, having more white space on the right gives me a different result
than having more white space on the left, and so on. Some of the
results are horrendously bad, and are miraculously accurate when the
image is slightly changed, but I can't find a one-size-fits-all
solution. What are the ideal image specifications to OCR?

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

OCR Problems (unicharambigs and image sizes)

Reply via email to