Re: Re: Re: tesseract improve the reject rate ?

liuguanqiang Wed, 30 Mar 2011 20:52:18 -0700

For example, I use the eng.traineddata(setwhitelist to "0123456789")  to 
recognize the digital in the following picture:


The tesseract output the correct result: "24013091"
Now, I have known there are only "5678" in the input image, So I setwhitelist 
to "5678".
On the above image, the tesseract output the wrong correct(include the length): 
"866685". 
In this case , how to let the tesseract oupt empty or null?
My question is which tess variables control the classifier match criterion ? 
I want to tune these tess variables to let the tesseract out null in the above 
case.
2011-03-31 



liuguanqiang 



发件人： Dmitri Silaev 
发送时间： 2011-03-29  14:59:22 
收件人： tesseract-ocr 
抄送： liuguanqiang 
主题： Re: Re: tesseract improve the reject rate ? 
As I always say, send the sample image(s) and describe what you need
exactly. Maybe you're looking in the wrong direction.
Warm regards,
Dmitri Silaev
On Tue, Mar 29, 2011 at 7:34 AM, liuguanqiang <[email protected]> wrote:
> Thanks for your reply.
> In another case, I use tesseract to recognize Chinese characters.
> Some Chinese character is recognized as other wrong Chinese character,
> though they are very different in apperance.
> The Chinese character has many(dense) strokes is the reason ?
> In this case, detecting ROI is helpless.
> My question is which tess variables control the classifier match metrics ?
> I want to tune these tess variables to solve this problem or
> improve the reject rate.
> Best regards
> 2011-03-29
> ________________________________
> liuguanqiang
> ________________________________
> 发件人： Dmitri Silaev
> 发送时间： 2011-03-27  05:36:01
> 收件人： tesseract-ocr
> 抄送： liuguanqiang
> 主题： Re: tesseract improve the reject rate ?
> When you have a small trained alphabet, Tesseract's classifier
> sometimes might not find suitable matches and in that way it will
> output a null character further converted to a space. However in your
> case, there are Chinese characters that have many strokes and
> outlines, many of which somehow (partially) match the characters from
> your whitelist. So be ready for a quantity of false detections even
> when your alphabet is small, i.e. you train Tess to get only digits.
> The best approach would be to determine locations where regions of
> interest (ROIs) are located, and then run the recognition over them,
> using appropriate whitelists.
> Warm regards,
> Dmitri Silaev
> On Sat, Mar 26, 2011 at 8:44 AM, liuguanqiang <[email protected]> wrote:
>> hi:
>> I use tesseract recognize digital(setwhitelist"0123456789") using
>> eng.traineddata.
>> There is some other character set(Chinese) in the test image, but the
>> tesseract recognize the chinese char  to digital.
>> Is there some tess variables to control this situation? Is this problem
>> equals " improve the reject rate "?
>> The following picture(binary) is recognized as "5221555255", how to let the
>> tesseract output null?
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

<<testImage(03-31-11-51-18).jpg>>

testbinary.tif
Description: Binary data

Re: Re: Re: tesseract improve the reject rate ?

Reply via email to