On 9 July 2010 10:01, caro <[email protected]> wrote:
> I am working with tesseract OCR and I would like to get at the end of
> the algorithm a confidence value which may express if the recognition
> seems OK or not really.
>
> For example, I have an image with the text: TEST RESULTS ARE OK.
> Depending on a threshold value, I can get different output of the OCR:
>  - TEST RESSUTTS AKE OC
>  - TEST TELLUTTS ARE OB
> ....
> The best threshold can be different for different images.
> So if I can get this confidence value, maybe it can give me the best
> theshold to choose for the OCR?
>

If you want to delve into the guts of tesseract, you can get at the
character choices and the confidence values attached by the
classifier, but that information by itself won't be much help -- see
my other mail.

You've got the start of a good idea here, but you need something
external to get you the rest of the way. One way that you can get
external information is to pass the words through a spellchecker or
use the DAWG facilities: the better thresholding value will have a
higher number of recognised words.

-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to