Thanks for your quick response on this issue. You are right the confidence value numbers might seem to be incorrect sometimes. Unfortunately, for my application it can be very difficult to find the mismatched characters using other methods since I only have digits in a different font.
However, you can actually get the numbers for each character. In the API level the program applies an algorithm and computes a confidence level for each word. But you can print the traces and find the confidence for each character blob as it is computed. This is what I understood from the previous post that I refferred to before. The traces can be printed using the function "tprintf" inside tesseract project "ccutil" folder "tprintf.cpp" which can be invoked by a piece of code in "wordrec" folder "wordclass.cpp" lines 132 - 139. The output I was able to get is also in my first post. The wiki page on debugging helped with the format of each line. The only thing that I do not understand is the order in which the iterations take place. The 'chop-word' phase takes place, then the 'improve', 'peices' and 'rebuild'. I do not fully understand what these mean and where is the location of the character represented in the trace, because without any reference to the location you cannot tell which character it is trying to rebuild/rematch. Nik
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

