Re: Re: Re: tesseract improve the reject rate ?

Saurabh Gandhi Wed, 30 Mar 2011 20:54:46 -0700

Thats simple, use the "0123456789" as the whitelist and then write a code on
top of it to convert the unwanted numbers to null. Your code can handle this
instead of tesseract.


--
Regards,
Saurabh Gandhi




2011/3/31 liuguanqiang <[email protected]>

>  For example, I use the eng.traineddata(setwhitelist to "0123456789")  to
> recognize the digital in the following picture:
>  The tesseract output the correct result: "24013091"
> Now, I have known there are only "5678" in the input image, So I
> setwhitelist to "5678".
> On the above image, the tesseract output the wrong correct(include the
> length): "866685".
> In this case , how to let the tesseract oupt empty or null?
> My question is which tess variables control the classifier match criterion ?
>
> I want to tune these tess variables to let the tesseract out null in the
> above case.
> 2011-03-31
> ------------------------------
>  liuguanqiang
> ------------------------------
> *发件人：* Dmitri Silaev
> *发送时间：* 2011-03-29  14:59:22
> *收件人：* tesseract-ocr
> *抄送：* liuguanqiang
> *主题：* Re: Re: tesseract improve the reject rate ?
>  As I always say, send the sample image(s) and describe what you need
> exactly. Maybe you're looking in the wrong direction.
>  Warm regards,
> Dmitri Silaev
>    On Tue, Mar 29, 2011 at 7:34 AM, liuguanqiang <[email protected]
> > wrote:
> > Thanks for your reply.
> > In another case, I use tesseract to recognize Chinese characters.
> > Some Chinese character is recognized as other wrong Chinese character,
> > though they are very different in apperance.
> > The Chinese character has many(dense) strokes is the reason ?
> > In this case, detecting ROI is helpless.
>
> > My question is which tess variables control the classifier match metrics ?
> > I want to tune these tess variables to solve this problem or
> > improve the reject rate.
> > Best regards
> > 2011-03-29
> > ________________________________
> > liuguanqiang
> > ________________________________
> > 发件人： Dmitri Silaev
> > 发送时间： 2011-03-27  05:36:01
> > 收件人： tesseract-ocr
> > 抄送： liuguanqiang
> > 主题： Re: tesseract improve the reject rate ?
> > When you have a small trained alphabet, Tesseract's classifier
> > sometimes might not find suitable matches and in that way it will
> > output a null character further converted to a space. However in your
> > case, there are Chinese characters that have many strokes and
> > outlines, many of which somehow (partially) match the characters from
> > your whitelist. So be ready for a quantity of false detections even
> > when your alphabet is small, i.e. you train Tess to get only digits.
> > The best approach would be to determine locations where regions of
> > interest (ROIs) are located, and then run the recognition over them,
> > using appropriate whitelists.
> > Warm regards,
> > Dmitri Silaev
> > On Sat, Mar 26, 2011 at 8:44 AM, liuguanqiang <[email protected]
> > wrote:
> >> hi:
> >> I use tesseract recognize digital(setwhitelist"0123456789") using
> >> eng.traineddata.
> >> There is some other character set(Chinese) in the test image, but the
> >> tesseract recognize the chinese char  to digital.
> >> Is there some tess variables to control this situation? Is this problem
> >> equals " improve the reject rate "?
>
> >> The following picture(binary) is recognized as "5221555255", how to let the
> >> tesseract output null?
> >>
> >>
> >> --
>
> >> You received this message because you are subscribed to the Google Groups
> >> "tesseract-ocr" group.
> >> To post to this group, send email to [email protected].
> >> To unsubscribe from this group, send email to
> >> [email protected].
> >> For more options, visit this group at
> >> http://groups.google.com/group/tesseract-ocr?hl=en.
> >>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

<<testImage(03-31-11-51-18).jpg>>

Re: Re: Re: tesseract improve the reject rate ?

Reply via email to