Thanks for your quick response on this issue.

You are right the confidence value numbers might seem to be incorrect
sometimes. Unfortunately, for my application it can be very difficult
to find the mismatched characters using other methods since I only
have digits in a different font.

However, you can actually get the numbers for each character. In the
API level the program applies an algorithm and computes a confidence
level for each word. But you can print the traces and find the
confidence for each character blob as it is computed. This is what I
understood from the previous post that I refferred to before.
The traces can be printed using the function "tprintf" inside
tesseract project "ccutil" folder "tprintf.cpp" which can be invoked
by a piece of code in "wordrec" folder "wordclass.cpp" lines 132 -
139.

The output I was able to get is also in my first post. The wiki page
on debugging helped with the format of each line. The only thing that
I do not understand is the order in which the iterations take place.
The 'chop-word' phase takes place, then the 'improve', 'peices' and
'rebuild'. I do not fully understand what these mean and where is the
location of the character represented in the trace, because without
any reference to the location you cannot tell which character it is
trying to rebuild/rematch.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.


Reply via email to