When you have a small trained alphabet, Tesseract's classifier
sometimes might not find suitable matches and in that way it will
output a null character further converted to a space. However in your
case, there are Chinese characters that have many strokes and
outlines, many of which somehow (partially) match the characters from
your whitelist. So be ready for a quantity of false detections even
when your alphabet is small, i.e. you train Tess to get only digits.

The best approach would be to determine locations where regions of
interest (ROIs) are located, and then run the recognition over them,
using appropriate whitelists.

Warm regards,
Dmitri Silaev





On Sat, Mar 26, 2011 at 8:44 AM, liuguanqiang <[email protected]> wrote:
> hi:
> I use tesseract recognize digital(setwhitelist"0123456789") using
> eng.traineddata.
> There is some other character set(Chinese) in the test image, but the
> tesseract recognize the chinese charĀ  to digital.
> Is there some tess variables to control this situation? Is thisĀ problem
> equals " improve the reject rate "?
> The following picture(binary) is recognized as "5221555255", how to let the
> tesseract output null?
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to