Thanks for your reply. In another case, I use tesseract to recognize Chinese characters. Some Chinese character is recognized as other wrong Chinese character, though they are very different in apperance. The Chinese character has many(dense) strokes is the reason ? In this case, detecting ROI is helpless. My question is which tess variables control the classifier match metrics ? I want to tune these tess variables to solve this problem or improve the reject rate. Best regards 2011-03-29
liuguanqiang 发件人: Dmitri Silaev 发送时间: 2011-03-27 05:36:01 收件人: tesseract-ocr 抄送: liuguanqiang 主题: Re: tesseract improve the reject rate ? When you have a small trained alphabet, Tesseract's classifier sometimes might not find suitable matches and in that way it will output a null character further converted to a space. However in your case, there are Chinese characters that have many strokes and outlines, many of which somehow (partially) match the characters from your whitelist. So be ready for a quantity of false detections even when your alphabet is small, i.e. you train Tess to get only digits. The best approach would be to determine locations where regions of interest (ROIs) are located, and then run the recognition over them, using appropriate whitelists. Warm regards, Dmitri Silaev On Sat, Mar 26, 2011 at 8:44 AM, liuguanqiang <[email protected]> wrote: > hi: > I use tesseract recognize digital(setwhitelist"0123456789") using > eng.traineddata. > There is some other character set(Chinese) in the test image, but the > tesseract recognize the chinese char to digital. > Is there some tess variables to control this situation? Is this problem > equals " improve the reject rate "? > The following picture(binary) is recognized as "5221555255", how to let the > tesseract output null? > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

