Re: Tesseract cannot read correct my image even it's very simple

Nick White Mon, 24 Jun 2013 10:11:18 -0700

Hi,

On Mon, Jun 17, 2013 at 04:09:02AM -0700, [email protected] wrote:
> Please help me solve the problem. My text is very simple but Tesseract show it
> as A7V33â€˜!, not A798D7. Please tell me why? I how to make Tesseract read it
> correct?


You're using Tesseract to try to crack captchas? Interesting...
There are other projects around that are focused on this; I don't
know how they work, but it might be worth you checking them out as
well.

There are two obvious issues here.

First is that the text has noise around it, which is hampering
Tesseract's recognition. If possible you should try to pre-process
it to remove as much noise as possible.

Second is that it looks likely that you only expect to see ASCII
characters. If that is the case, use the whitelist function to
ensure that characters like euro and a-circumflex are never
considered. It is explained in this FAQ entry:
http://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits?

Nick

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Tesseract cannot read correct my image even it's very simple

Reply via email to