On 9 July 2010 16:55, patrickq <[email protected]> wrote:
> TesserractExtractResult() returns the confidence numbers for all
> characters returned. A high number means low confidence. Caveats:
> 1. The confidence numbers are the same for all letters in a word (even
> though Tesseract does compute confidence numbers for each letter, it
> just doesn't return them to the API)
> 2. From personal experience, these numbers are not very reliable and
> we decided not to use them - but feel free to test yourself, we gave
> up fairly quickly.
>

Right; if I could sketch this on some paper it might be a bit more
clear, but I can't so I'll try to describe it...

R to K is the easiest to describe; cover the top of the R and it looks
like a K. Smudges, glare from the scanner's light, boxing errors,
etc., are things that can cause this degradation. Thresholding can
contribute to the problem, because it's greyscale to binary: if it's
too light, it's effectively wiped out. Access to the character
probabilities won't actually help, because if thresholding 1 gives you
an R with a broken top, it will have a relatively low confidence
value, whereas thresholding 2, that has removed it completely, will
have a higher confidence value of the character as 'K'. Going purely
by character probabilities can just as easily give you the worst
results of both as it can the best.

> Patrick
>
> On Jul 9, 5:01 am, caro <[email protected]> wrote:
>> I am working with tesseract OCR and I would like to get at the end of
>> the algorithm a confidence value which may express if the recognition
>> seems OK or not really.
>>
>> For example, I have an image with the text: TEST RESULTS ARE OK.
>> Depending on a threshold value, I can get different output of the OCR:
>>  - TEST RESSUTTS AKE OC
>>  - TEST TELLUTTS ARE OB
>> ....
>> The best threshold can be different for different images.
>> So if I can get this confidence value, maybe it can give me the best
>> theshold to choose for the OCR?
>>
>> Thank you for your help,
>> Caroline
>
> --
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>



-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to