OK, thanks for the help!
On Aug 6, 8:27 am, zdenko podobny <[email protected]> wrote: > DPI is 96 -> make images with 300 DPI > What I did: I changed image DPI to 300, than I resized it by 200%. It was > not enough. So I resized it once again by 200% and converted to > grayscale [1]. > Than tesseract (3.01) recognized it very good: only "/" in date was > interpreted as 1. > > Zdenko > > [1]https://docs.google.com/leaf?id=0B6RGB-MIlOIfOWI1ZGNkOTgtYTc3MC00YjE4... > > > > On Fri, Aug 5, 2011 at 3:46 PM, sydd <[email protected]> wrote: > > Here are 2 sample images: > >http://imgur.com/UxNjf > > > (i just pasted 2 sample images into a white canvas, the 2 images will > > be OCRed separately) > > > On Aug 5, 8:17 am, zdenko podobny <[email protected]> wrote: > > > Hello, > > > > first of all - please provide example image. Than people can make you > > > suggest you some improvement. > > > > Zdenko > > > > On Thu, Aug 4, 2011 at 10:44 PM, sydd <[email protected]> wrote: > > > > Hello > > > > > I need to make an OCR application, that can recognise birth dates, for > > > > example: > > > > "John, Smith" > > > > "02/01/2011" > > > > "John, Smith, 34 YO DOB 06Jan1981" > > > > I have these texts in a clean, cropped image (solid background), and > > > > on the image is just this text. > > > > All of the text is written in Arial, either 11px or 20px font size. > > > > > I have got 2 questions about the OCR: > > > > 1. I've read that tesseract is bad for recognising small text. What > > > > can i do to make the recongition better? Enlarge the images? use some > > > > filters on it? (I've tried enlarging, but the results were not so > > > > good, like 30% fail rate) > > > > 2. What would help the recognising process? Should i train tesseract > > > > for a custom language? (this is the only kind of 'setting' i read in > > > > the docs) Or is there some kind of supervised learning procedure for > > > > tesseract? I figured out, that i can whitelist characters with the > > > > command tessedit_char_whitelist , but i cant find a list of other > > > > config options or their meanings > > > > > Thanks for help, i am really clueless how to get started with this > > > > project because of the lack of documentation. > > > > > -- > > > > You received this message because you are subscribed to the Google > > > > Groups "tesseract-ocr" group. > > > > To post to this group, send email to [email protected] > > > > To unsubscribe from this group, send email to > > > > [email protected] > > > > For more options, visit this group at > > > >http://groups.google.com/group/tesseract-ocr?hl=en > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > >http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

