Hi,
I am using Tesseract version 2.04 and trying to extract the
confidence level for each character. There has been a previous
discussion about this issue, but it hasnt been discussed for the past
2 and a half years therefore, I wanted to get some new input.
the previous thread was :
http://groups.google.com/group/tesseract-ocr/browse_thread/thread/1cdb99045c77d04/f34d76199b8b8fea?hl=en&lnk=gst&q=confidence+character
Tesseract works fine for the most part however, when a certain
character is not recognized it chooses the most likely option out of
the character set and prints it. In this case I would like to output
an error or a special character when a certain character in the input
file cannot be recognized with a certain confidence level.
I have been able to follow the previous thread (thanks to all the
members) and have been able to print a final file containing the
probability of each character. But I dont know how to make sense of
different iterations that take place to corrrect an image to improve
its clarity and matching characteristics.
If someone could explain the format in which the traces are printed in
the tprintf funciton it would be greatly appreciated.
Example output for an image containing "09063" as input :
Tesseract Open Source OCR Engine
chop_word:
10.79 -2.03 : 0 [30 ]0
chop_word:
6.03 -1.49 : 9 [39 ]0
chop_word:
8.08 -1.52 : 0 [30 ]0
chop_word:
16.86 -3.94 : 6 [36 ]0
chop_word:
5.20 -1.12 : 3 [33 ]0
improve 1:
20.42 -5.92 : 6 [36 ]0
improve 2:
16.65 -12.33 : : [3a ] 17.86 -13.23 : 0 [30 ]0
pieces:
80.98 -9.23 : 0 [30 ]0
pieces:
58.07 -9.68 : 3 [33 ]0
rebuild
16.86 -3.94 : 6 [36 ]0
chop_word:
0.42 -0.08 : 0 [30 ]0
chop_word:
6.03 -1.49 : 9 [39 ]0
chop_word:
6.14 -1.15 : 0 [30 ]0
chop_word:
16.86 -3.94 : 6 [36 ]0
chop_word:
5.20 -1.12 : 3 [33 ]0
improve 1:
20.42 -5.92 : 6 [36 ]0
improve 2:
16.65 -12.33 : : [3a ] 17.86 -13.23 : 0 [30 ]0
pieces:
80.98 -9.23 : 0 [30 ]0
pieces:
58.07 -9.68 : 3 [33 ]0
rebuild
16.86 -3.94 : 6 [36 ]0
Thanks,
Nik
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.