Thats simple, use the "0123456789" as the whitelist and then write a code on top of it to convert the unwanted numbers to null. Your code can handle this instead of tesseract.
-- Regards, Saurabh Gandhi 2011/3/31 liuguanqiang <[email protected]> > For example, I use the eng.traineddata(setwhitelist to "0123456789") to > recognize the digital in the following picture: > The tesseract output the correct result: "24013091" > Now, I have known there are only "5678" in the input image, So I > setwhitelist to "5678". > On the above image, the tesseract output the wrong correct(include the > length): "866685". > In this case , how to let the tesseract oupt empty or null? > My question is which tess variables control the classifier match criterion ? > > I want to tune these tess variables to let the tesseract out null in the > above case. > 2011-03-31 > ------------------------------ > liuguanqiang > ------------------------------ > *发件人:* Dmitri Silaev > *发送时间:* 2011-03-29 14:59:22 > *收件人:* tesseract-ocr > *抄送:* liuguanqiang > *主题:* Re: Re: tesseract improve the reject rate ? > As I always say, send the sample image(s) and describe what you need > exactly. Maybe you're looking in the wrong direction. > Warm regards, > Dmitri Silaev > On Tue, Mar 29, 2011 at 7:34 AM, liuguanqiang <[email protected] > > wrote: > > Thanks for your reply. > > In another case, I use tesseract to recognize Chinese characters. > > Some Chinese character is recognized as other wrong Chinese character, > > though they are very different in apperance. > > The Chinese character has many(dense) strokes is the reason ? > > In this case, detecting ROI is helpless. > > > My question is which tess variables control the classifier match metrics ? > > I want to tune these tess variables to solve this problem or > > improve the reject rate. > > Best regards > > 2011-03-29 > > ________________________________ > > liuguanqiang > > ________________________________ > > 发件人: Dmitri Silaev > > 发送时间: 2011-03-27 05:36:01 > > 收件人: tesseract-ocr > > 抄送: liuguanqiang > > 主题: Re: tesseract improve the reject rate ? > > When you have a small trained alphabet, Tesseract's classifier > > sometimes might not find suitable matches and in that way it will > > output a null character further converted to a space. However in your > > case, there are Chinese characters that have many strokes and > > outlines, many of which somehow (partially) match the characters from > > your whitelist. So be ready for a quantity of false detections even > > when your alphabet is small, i.e. you train Tess to get only digits. > > The best approach would be to determine locations where regions of > > interest (ROIs) are located, and then run the recognition over them, > > using appropriate whitelists. > > Warm regards, > > Dmitri Silaev > > On Sat, Mar 26, 2011 at 8:44 AM, liuguanqiang <[email protected] > > wrote: > >> hi: > >> I use tesseract recognize digital(setwhitelist"0123456789") using > >> eng.traineddata. > >> There is some other character set(Chinese) in the test image, but the > >> tesseract recognize the chinese char to digital. > >> Is there some tess variables to control this situation? Is this problem > >> equals " improve the reject rate "? > > >> The following picture(binary) is recognized as "5221555255", how to let the > >> tesseract output null? > >> > >> > >> -- > > >> You received this message because you are subscribed to the Google Groups > >> "tesseract-ocr" group. > >> To post to this group, send email to [email protected]. > >> To unsubscribe from this group, send email to > >> [email protected]. > >> For more options, visit this group at > >> http://groups.google.com/group/tesseract-ocr?hl=en. > >> > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
<<testImage(03-31-11-51-18).jpg>>

