Re: OCR Problems (unicharambigs and image sizes)

Dmitri Silaev Thu, 01 Sep 2011 21:03:39 -0700

Although you've given some info, it's not enough. Pleasу complete the
following checklist:


>>
Make sure you have read the Wiki at
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
and searched the forum for questions similar to yours.

If you'd like your question to be answered, please ensure your message
contains the following:
- Sample image (or a set of such images) you are trying to recognize
- If you trained Tesseract yourself, attach all the source files you
used to build your "traineddata" file and the "traineddata" file itself
- Provide all the command lines you used to train Tesseract and recognize images
- Attach all config files you used during training and recognition, no
matter if they are "stock" or created manually
- If you are using a compiled Tesseract executable report the web page
from where
you downloaded it
- If you compile Tesseract yourself or call it from your own code, indicate
the SVN revision you use
- If you call Tesseract from code, provide the entire code snippet you
use for calling

The less info you provide the less chances are your question will be answered.
Providing the full info does not guarantee your question to be answered, though.
<<

Warm regards,
Dmitri Silaev
www.CustomOCR.com





On Thu, Sep 1, 2011 at 7:06 PM, Alan Willard <[email protected]> wrote:
> Hello All,
> I have a OCR scenario where we are trying to OCR text from screen
> images. I have a trained language that includes the one specific font
> in use.
>
> I have noticed a couple of strange issues.
>
> 1.) unicharambigs and dictionary seems to have no effect. For example
> a very common error I see is the character 'a' being interpreted as an
> 'e'. This is despite having a line in unicharambigs that tries to
> resolve the ambiguity, AND the original word is a dictionary word, and
> the result is not. Example: art -> ert
>
> 2.) The size of the image seems to greatly influence the quality of
> OCR. Not only the size, but the location of the text within that
> image. My OCR scenarios are really simple, black text on a white
> background, no other noise (like a standard text field). I will get
> different OCR results based on the amount of white space around the
> text, having more white space on the right gives me a different result
> than having more white space on the left, and so on. Some of the
> results are horrendously bad, and are miraculously accurate when the
> image is slightly changed, but I can't find a one-size-fits-all
> solution. What are the ideal image specifications to OCR?
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: OCR Problems (unicharambigs and image sizes)

Reply via email to