Hi Caroline. I'm thinking of using a dictionary approach coupled with
varying thresholds to come up with votes for correct sentence parts. A
rough sketch (for recognising English Heritage Plaques) is here:
http://aicookbook.com/wiki/Automatic_plaque_transcription

Basically:
Try many thresholds, extract OCR results for each
Use a dictionary to vote on how English each sentence is
Choose the highest voted sentence to build a composite result

The dictionary step will include problem-specific rules - for plaque
recognition it'll include rules about date formats (they're usually
something like "1863-1845" e.g. 4 nbrs, minus, 4 nbrs). The dictionary
will include proper names for people and locations that are associated
with the geo tags for the plaque.

HTH,
Ian.


On 9 July 2010 10:01, caro <[email protected]> wrote:
> I am working with tesseract OCR and I would like to get at the end of
> the algorithm a confidence value which may express if the recognition
> seems OK or not really.
>
> For example, I have an image with the text: TEST RESULTS ARE OK.
> Depending on a threshold value, I can get different output of the OCR:
>  - TEST RESSUTTS AKE OC
>  - TEST TELLUTTS ARE OB
> ....
> The best threshold can be different for different images.
> So if I can get this confidence value, maybe it can give me the best
> theshold to choose for the OCR?
>
> Thank you for your help,
> Caroline
>
> --
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>



-- 
Ian Ozsvald (A.I. researcher, screencaster)
[email protected]

http://IanOzsvald.com
http://MorConsulting.com/
http://blog.AICookbook.com/
http://TheScreencastingHandbook.com
http://FivePoundApp.com/
http://twitter.com/IanOzsvald

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to