Hello, first of all - please provide example image. Than people can make you suggest you some improvement.
Zdenko On Thu, Aug 4, 2011 at 10:44 PM, sydd <[email protected]> wrote: > Hello > > I need to make an OCR application, that can recognise birth dates, for > example: > "John, Smith" > "02/01/2011" > "John, Smith, 34 YO DOB 06Jan1981" > I have these texts in a clean, cropped image (solid background), and > on the image is just this text. > All of the text is written in Arial, either 11px or 20px font size. > > I have got 2 questions about the OCR: > 1. I've read that tesseract is bad for recognising small text. What > can i do to make the recongition better? Enlarge the images? use some > filters on it? (I've tried enlarging, but the results were not so > good, like 30% fail rate) > 2. What would help the recognising process? Should i train tesseract > for a custom language? (this is the only kind of 'setting' i read in > the docs) Or is there some kind of supervised learning procedure for > tesseract? I figured out, that i can whitelist characters with the > command tessedit_char_whitelist , but i cant find a list of other > config options or their meanings > > Thanks for help, i am really clueless how to get started with this > project because of the lack of documentation. > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

