Hi, On Mon, Jun 17, 2013 at 04:09:02AM -0700, [email protected] wrote: > Please help me solve the problem. My text is very simple but Tesseract show it > as A7V33‘!, not A798D7. Please tell me why? I how to make Tesseract read it > correct?
You're using Tesseract to try to crack captchas? Interesting... There are other projects around that are focused on this; I don't know how they work, but it might be worth you checking them out as well. There are two obvious issues here. First is that the text has noise around it, which is hampering Tesseract's recognition. If possible you should try to pre-process it to remove as much noise as possible. Second is that it looks likely that you only expect to see ASCII characters. If that is the case, use the whitelist function to ensure that characters like euro and a-circumflex are never considered. It is explained in this FAQ entry: http://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits? Nick -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

