DPI is 96 -> make images with 300 DPI
What I did: I changed image DPI to 300, than I resized it by 200%. It was
not enough. So I resized it once again by 200% and converted to
grayscale [1].
Than tesseract (3.01) recognized it very good: only "/" in date was
interpreted as 1.

Zdenko

[1]
https://docs.google.com/leaf?id=0B6RGB-MIlOIfOWI1ZGNkOTgtYTc3MC00YjE4LTgxYmYtZWQyMTUxMjMzN2Q0&hl=en_US

On Fri, Aug 5, 2011 at 3:46 PM, sydd <[email protected]> wrote:

> Here are 2 sample images:
> http://imgur.com/UxNjf
>
> (i just pasted 2 sample images into a white canvas, the 2 images will
> be OCRed separately)
>
> On Aug 5, 8:17 am, zdenko podobny <[email protected]> wrote:
> > Hello,
> >
> > first of all - please provide example image. Than people can make you
> > suggest you some improvement.
> >
> > Zdenko
> >
> >
> >
> > On Thu, Aug 4, 2011 at 10:44 PM, sydd <[email protected]> wrote:
> > > Hello
> >
> > > I need to make an OCR application, that can recognise birth dates, for
> > > example:
> > > "John, Smith"
> > > "02/01/2011"
> > > "John, Smith, 34 YO DOB 06Jan1981"
> > > I have these texts in a clean, cropped image (solid background), and
> > > on the image is just this text.
> > > All of the text is written in Arial, either 11px or 20px font size.
> >
> > > I have got 2 questions about the OCR:
> > > 1. I've read that tesseract is bad for recognising small text. What
> > > can i do to make the recongition better? Enlarge the images? use some
> > > filters on it? (I've tried enlarging, but the results were not so
> > > good, like 30% fail rate)
> > > 2. What would help the recognising process? Should i train tesseract
> > > for a custom language? (this is the only kind of 'setting' i read in
> > > the docs) Or is there some kind of supervised learning procedure for
> > > tesseract? I figured out, that i can whitelist characters with the
> > > command tessedit_char_whitelist , but i cant find a list of other
> > > config options or their meanings
> >
> > > Thanks for help, i am really clueless how to get started with this
> > > project because of the lack of documentation.
> >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "tesseract-ocr" group.
> > > To post to this group, send email to [email protected]
> > > To unsubscribe from this group, send email to
> > > [email protected]
> > > For more options, visit this group at
> > >http://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to