Re: Is it possible to get a confidence value for the tesseract OCR result?

Patrick Questembert Tue, 13 Jul 2010 10:40:53 -0700

Here is a code snipet:

    PAGE_RES* page_res_pass1 = myTess->RecognitionPass1(block_list);


    char *textOCR = NULL;
    int matchedChars = 0;
    int *lengths = NULL;
    int *x0 = NULL;
    int *y0 = NULL;
    int *x1 = NULL;
    int *y1 = NULL;
    float *costs = NULL;

    matchedChars = myTess->TesseractExtractResult(&textOCR, &lengths,
&costs, &x0, &y0, &x1, &y1, page_res_pass1);

Comments:
- the textOCR array is a series of multibyte UTF8 unicode characters, the
lenghts array indicates the number of bytes in each letter, so the total
length of that array is sum(lenghts[i]) with i iterating from 0 to
(matchedChars - 1)
- Note: you will need to null-terminate the textOCR array yourself
- matchedChars is the number of letters found
- costs has one float value per letter found. As mentioned, these values
will be identical for all letters in a given word
- no newlines returned: spaces and newlines are returned as spaces and your
code needs to decide if it's a newline or a space based on the x0,y0,x1,y1
coords
- all arrays need to be freed by the caller

Let me know if you need more help.

On Tue, Jul 13, 2010 at 4:26 AM, caro <[email protected]> wrote:

> OK, thank you for your help.
> Can you just precise me how to use this function
> TesserractExtractResult()?
>
> Thank you,
> Caroline
>
>
> On Jul 9, 6:35 pm, "Jimmy O'Regan" <[email protected]> wrote:
> > On 9 July 2010 10:01, caro <[email protected]> wrote:
> >
> > > I am working with tesseract OCR and I would like to get at the end of
> > > the algorithm a confidence value which may express if the recognition
> > > seems OK or not really.
> >
> > > For example, I have an image with the text: TEST RESULTS ARE OK.
> > > Depending on a threshold value, I can get different output of the OCR:
> > >  - TEST RESSUTTS AKE OC
> > >  - TEST TELLUTTS ARE OB
> > > ....
> > > The best threshold can be different for different images.
> > > So if I can get this confidence value, maybe it can give me the best
> > > theshold to choose for the OCR?
> >
> > If you want to delve into the guts of tesseract, you can get at the
> > character choices and the confidence values attached by the
> > classifier, but that information by itself won't be much help -- see
> > my other mail.
> >
> > You've got the start of a good idea here, but you need something
> > external to get you the rest of the way. One way that you can get
> > external information is to pass the words through a spellchecker or
> > use the DAWG facilities: the better thresholding value will have a
> > higher number of recognised words.
> >
> > --
> > <Leftmost> jimregan, that's because deep inside you, you are evil.
> > <Leftmost> Also not-so-deep inside you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<tesseract-ocr%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: Is it possible to get a confidence value for the tesseract OCR result?

Reply via email to