When you have a small trained alphabet, Tesseract's classifier sometimes might not find suitable matches and in that way it will output a null character further converted to a space. However in your case, there are Chinese characters that have many strokes and outlines, many of which somehow (partially) match the characters from your whitelist. So be ready for a quantity of false detections even when your alphabet is small, i.e. you train Tess to get only digits.
The best approach would be to determine locations where regions of interest (ROIs) are located, and then run the recognition over them, using appropriate whitelists. Warm regards, Dmitri Silaev On Sat, Mar 26, 2011 at 8:44 AM, liuguanqiang <[email protected]> wrote: > hi: > I use tesseract recognize digital(setwhitelist"0123456789") using > eng.traineddata. > There is some other character set(Chinese) in the test image, but the > tesseract recognize the chinese charĀ to digital. > Is there some tess variables to control this situation? Is thisĀ problem > equals " improve the reject rate "? > The following picture(binary) is recognized as "5221555255", how to let the > tesseract output null? > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

