Although you've given some info, it's not enough. Pleasу complete the following checklist:
>> Make sure you have read the Wiki at http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 and searched the forum for questions similar to yours. If you'd like your question to be answered, please ensure your message contains the following: - Sample image (or a set of such images) you are trying to recognize - If you trained Tesseract yourself, attach all the source files you used to build your "traineddata" file and the "traineddata" file itself - Provide all the command lines you used to train Tesseract and recognize images - Attach all config files you used during training and recognition, no matter if they are "stock" or created manually - If you are using a compiled Tesseract executable report the web page from where you downloaded it - If you compile Tesseract yourself or call it from your own code, indicate the SVN revision you use - If you call Tesseract from code, provide the entire code snippet you use for calling The less info you provide the less chances are your question will be answered. Providing the full info does not guarantee your question to be answered, though. << Warm regards, Dmitri Silaev www.CustomOCR.com On Thu, Sep 1, 2011 at 7:06 PM, Alan Willard <[email protected]> wrote: > Hello All, > I have a OCR scenario where we are trying to OCR text from screen > images. I have a trained language that includes the one specific font > in use. > > I have noticed a couple of strange issues. > > 1.) unicharambigs and dictionary seems to have no effect. For example > a very common error I see is the character 'a' being interpreted as an > 'e'. This is despite having a line in unicharambigs that tries to > resolve the ambiguity, AND the original word is a dictionary word, and > the result is not. Example: art -> ert > > 2.) The size of the image seems to greatly influence the quality of > OCR. Not only the size, but the location of the text within that > image. My OCR scenarios are really simple, black text on a white > background, no other noise (like a standard text field). I will get > different OCR results based on the amount of white space around the > text, having more white space on the right gives me a different result > than having more white space on the left, and so on. Some of the > results are horrendously bad, and are miraculously accurate when the > image is slightly changed, but I can't find a one-size-fits-all > solution. What are the ideal image specifications to OCR? > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

