On 5 October 2010 23:43, haratron <[email protected]> wrote:
> I'm using tesseract 3.00 with hOCR output and I get the xocr_word
> among other things.
> Example:
> <span class='xocr_word' id='xword_1_5' title="x_wconf -4">testing</span>
>
> The x_wconf attribute is for certainty of the result. Which is
> calculated through a certainty() function, from what I saw in
> tesseract's source.
> The problem is that I can't find the function's definition anywhere.
> How does it work? What are the boundaries (lower and upper limit) of
> the certainty() return value?

There is no single 'certainty' function that calculates certainty. The
certainty() member of the WERD_CHOICE class is an accessor method;
multiple functions may be involved in calculating an overall certainty
for a particular word: TessBaseAPI::AllWordConfidences() will give you
an array of candidates, but through the hOCR output, you'll only get
the one that was finally selected. The fragment that function uses to
convert the value to one between 0 and 100 is:
    int w_conf = static_cast<int>(100 + 5 * choice->certainty());
                 // This is the eq for converting Tesseract confidence to 1..100
    if (w_conf < 0) w_conf = 0;
    if (w_conf > 100) w_conf = 100;


-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to