Hi Caroline. I'm thinking of using a dictionary approach coupled with varying thresholds to come up with votes for correct sentence parts. A rough sketch (for recognising English Heritage Plaques) is here: http://aicookbook.com/wiki/Automatic_plaque_transcription
Basically: Try many thresholds, extract OCR results for each Use a dictionary to vote on how English each sentence is Choose the highest voted sentence to build a composite result The dictionary step will include problem-specific rules - for plaque recognition it'll include rules about date formats (they're usually something like "1863-1845" e.g. 4 nbrs, minus, 4 nbrs). The dictionary will include proper names for people and locations that are associated with the geo tags for the plaque. HTH, Ian. On 9 July 2010 10:01, caro <[email protected]> wrote: > I am working with tesseract OCR and I would like to get at the end of > the algorithm a confidence value which may express if the recognition > seems OK or not really. > > For example, I have an image with the text: TEST RESULTS ARE OK. > Depending on a threshold value, I can get different output of the OCR: > - TEST RESSUTTS AKE OC > - TEST TELLUTTS ARE OB > .... > The best threshold can be different for different images. > So if I can get this confidence value, maybe it can give me the best > theshold to choose for the OCR? > > Thank you for your help, > Caroline > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- Ian Ozsvald (A.I. researcher, screencaster) [email protected] http://IanOzsvald.com http://MorConsulting.com/ http://blog.AICookbook.com/ http://TheScreencastingHandbook.com http://FivePoundApp.com/ http://twitter.com/IanOzsvald -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

