Hello

I need to make an OCR application, that can recognise birth dates, for
example:
"John, Smith"
"02/01/2011"
"John, Smith, 34 YO DOB 06Jan1981"
I have these texts in a clean, cropped image (solid background), and
on the image is just this text.
All of the text is written in Arial, either 11px or 20px font size.

I have got 2 questions about the OCR:
1. I've read that tesseract is bad for recognising small text. What
can i do to make the recongition better? Enlarge the images? use some
filters on it? (I've tried enlarging, but the results were not so
good, like 30% fail rate)
2. What would help the recognising process? Should i train tesseract
for a custom language? (this is the only kind of 'setting' i read in
the docs) Or is there some kind of supervised learning procedure for
tesseract? I figured out, that i can whitelist characters with the
command tessedit_char_whitelist , but i cant find a list of other
config options or their meanings

Thanks for help, i am really clueless how to get started with this
project because of the lack of documentation.

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to