Hello All, I have a OCR scenario where we are trying to OCR text from screen images. I have a trained language that includes the one specific font in use.
I have noticed a couple of strange issues. 1.) unicharambigs and dictionary seems to have no effect. For example a very common error I see is the character 'a' being interpreted as an 'e'. This is despite having a line in unicharambigs that tries to resolve the ambiguity, AND the original word is a dictionary word, and the result is not. Example: art -> ert 2.) The size of the image seems to greatly influence the quality of OCR. Not only the size, but the location of the text within that image. My OCR scenarios are really simple, black text on a white background, no other noise (like a standard text field). I will get different OCR results based on the amount of white space around the text, having more white space on the right gives me a different result than having more white space on the left, and so on. Some of the results are horrendously bad, and are miraculously accurate when the image is slightly changed, but I can't find a one-size-fits-all solution. What are the ideal image specifications to OCR? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

