I can't remember exactly if this was already discussed on the forum,
but when taking screenshots be sure to turn off any font smoothing
technologies, like Windows' ClearType, as these can result in
significant degradation of recognition quality. You can check the
effect of font smoothing and understand why it is detrimental for OCR
in Windows Paint with ClearType on when typing say the letter "a" and
zooming the image in (by pressing Ctrl+PgDn)

Warm regards,
Dmitri Silaev
www.CustomOCR.com





On Sat, Aug 20, 2011 at 4:45 PM, Sriranga(78yrsold)
<[email protected]> wrote:
> Dmitri,
> Thanks for the valuable suggestion
> With regards,
> -sriranga(78yrs)
>
> On Sat, Aug 20, 2011 at 5:49 PM, Dmitri Silaev <[email protected]>
> wrote:
>>
>> As a rule of the thumb, usually one can obtain good recognition
>> results for all standard regular fonts of 11-16pt size, be it a
>> screenshot or a 300 DPI scanned image. Should font size, resolution,
>> etc. differ significantly from these numbers, recognition quality
>> becomes a matter of experimentation.
>>
>> Warm regards,
>> Dmitri Silaev
>> www.CustomOCR.com
>>
>>
>>
>>
>>
>> On Sat, Aug 20, 2011 at 2:14 PM, Sriranga(78yrsold)
>> <[email protected]> wrote:
>> > Dmitri,
>> > really the issue is very complex/complicated to understand by layman
>> > user.
>> > For training purpose in tesseract-ocr, , what is your expertise valuable
>> > guidance to be followed by users - who uses generally depends on scanner
>> > machine and "Print Screen"Key of the computer..
>> > 1)for scanning the typed text  - ,(a) font size in the text should be
>> > used.(b) resolution to be set in the scanner.
>> > 2)For Screenshot of the typed text file= with help of Irfanview, or
>> > imagemagic etc. resolution should be increased from 96 to 300 dpi
>> > for any image format like tif, png etc.
>> > With regards,
>> > -sriranga(78yrs)
>> >
>> >
>> > On Sat, Aug 20, 2011 at 1:33 PM, Dmitri Silaev <[email protected]>
>> > wrote:
>> >>
>> >> There are different cases of how pixel height of a font's character
>> >> should be calculated. If you're trying to recognize a screenshot, you
>> >> may deem one pt to be equal to one pixel when typing it in Windows
>> >> Paint. However this might not be true for more complex editors like
>> >> Photoshop. Also this depends on physical size of screen's pixel and
>> >> current video mode resolution. Another case is a scanned image, here
>> >> pixel height depends on scanning resolution. Still another case, where
>> >> imho trying to relate pixel height to font's point size absolutely
>> >> lacks sense (however it is possible via some multi-parameter
>> >> formulas), is a photographic or video frame image; here pixel height
>> >> varies depending on the camera position and even can vary within a
>> >> single line of text.
>> >>
>> >> All in all, Tesseract does not bother itself with DPIs, pt sizes,
>> >> etc.; only pixel size is important for recognition. You can use this
>> >> formula for scanned images to roughly determine font pixel height:
>> >>
>> >> pixels = DPI * pts / 72
>> >>
>> >> where pixels - pixel height to be found, DPI - scanning resolution,
>> >> pts - size of font in typographic points
>> >>
>> >> However the most reliable is to scan a test page and manually count
>> >> pixels.
>> >>
>> >> For those willing to understand everything, here are the links:
>> >> http://en.wikipedia.org/wiki/Dots_per_inch
>> >> http://en.wikipedia.org/wiki/Point_%28typography%29
>> >> http://en.wikipedia.org/wiki/X-height
>> >>
>> >> Warm regards,
>> >> Dmitri Silaev
>> >> www.CustomOCR.com
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Sat, Aug 20, 2011 at 7:48 AM, Sriranga(78yrsold)
>> >> <[email protected]> wrote:
>> >> > Dmitri,
>> >> > Thanks for the valuable guidance. I seek some clarification as
>> >> > follows=
>> >> > (1)"Tesseract, trained with ordinary fonts, proved good with fonts
>> >> > of12-64
>> >> > pixel height" it would be nice, if indicated equivalent font size for
>> >> > pixel
>> >> > of 12-64? For 10 or 20 pt size of the regular(ordinary) font what is
>> >> > the
>> >> > pixel height used in the Notepad?
>> >> > I am not programmer nor developer - as such I am seeking valuable
>> >> > guidance
>> >> > as user.
>> >> > BTW Is it to possible to count the pixel of any size say 20 pt of
>> >> > regular in
>> >> > the paint brush in which it has gird ( graph like). Just
>> >> > now I tested in paintbrush vide screenshot attached. alphabets was
>> >> > typed
>> >> > using Arial- 20 and  counted pixel -it has 20 pixels.
>> >> >
>> >> > Thus it is presumed that 12-64 pixel height is equivalent to 12-64
>> >> > point
>> >> > size of the ordinary font - kindly confirm.
>> >> > With warmest regards,
>> >> > -sriranga(78yrs)
>> >> >
>> >> >
>> >> > On Sat, Aug 20, 2011 at 1:00 AM, Dmitri Silaev
>> >> > <[email protected]>
>> >> > wrote:
>> >> >>
>> >> >> The DPI measure is confusing for Tesseract's OCR, forget about it.
>> >> >> The
>> >> >> big thing is within-image font's x-height, measured in pixels.
>> >> >> Tesseract, trained with ordinary fonts, proved good with fonts of
>> >> >> 12-64 pixel height. If you have bigger characters, scale them down.
>> >> >> If
>> >> >> you have a font that's bold, use morphology and erode characters
>> >> >> after
>> >> >> binarization. Experiment. Removing "greyness" won't help as it's not
>> >> >> a
>> >> >> generic way of getting rid of uneven illumination; you need to use
>> >> >> more sophisticated algorithms. Just using Photoshop won't let you
>> >> >> achieve much.
>> >> >>
>> >> >> Warm regards,
>> >> >> Dmitri Silaev
>> >> >> www.CustomOCR.com
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Fri, Aug 19, 2011 at 8:18 PM, Andriy Malovanyy
>> >> >> <[email protected]>
>> >> >> wrote:
>> >> >> > To Zdenko:
>> >> >> > I think I have 3.0 version installed, so maybe I should reinstall
>> >> >> > the
>> >> >> > new version and try it. Thanks for the description of psm. Did you
>> >> >> > try
>> >> >> > to recognize other unedited images which I attached to
>> >> >> > the first post??
>> >> >> >
>> >> >> > To Rob:
>> >> >> > Initially I had 640x480 image with 72dpi with number occupying
>> >> >> > almost
>> >> >> > all the image. What I did is just opened the image in Photoshop,
>> >> >> > went
>> >> >> > to size of image menu, changed the resolution to 300 dpi (image
>> >> >> > increased in size) and set the image size back to 640x480. So,
>> >> >> > with
>> >> >> > that I got 640x480 image with 300dpi resolution.
>> >> >> >
>> >> >> > On 19 Aug, 17:56, Robert Komar <[email protected]> wrote:
>> >> >> >> On Fri, 19 Aug 2011, Andriy Malovanyy wrote:
>> >> >> >> > To sriranga:
>> >> >> >> > I tried changing dpi (check the previous post). It doesnt work.
>> >> >> >>
>> >> >> >> Did you rescale the image from 72 dpi to 300 dpi, or just change
>> >> >> >> the tag on the original image to say 300 dpi?  The latter won't
>> >> >> >> work.
>> >> >> >> Tesseract seems to be tuned to work best for scans at 300 dpi
>> >> >> >> (although I've often successfully used 600 dpi).  Scans done at
>> >> >> >> 72 dpi usually get very poor results from tesseract.
>> >> >> >>
>> >> >> >> Cheers,
>> >> >> >> Rob Komar
>> >> >> >
>> >> >> > --
>> >> >> > You received this message because you are subscribed to the Google
>> >> >> > Groups "tesseract-ocr" group.
>> >> >> > To post to this group, send email to
>> >> >> > [email protected]
>> >> >> > To unsubscribe from this group, send email to
>> >> >> > [email protected]
>> >> >> > For more options, visit this group at
>> >> >> > http://groups.google.com/group/tesseract-ocr?hl=en
>> >> >> >
>> >> >>
>> >> >> --
>> >> >> You received this message because you are subscribed to the Google
>> >> >> Groups "tesseract-ocr" group.
>> >> >> To post to this group, send email to [email protected]
>> >> >> To unsubscribe from this group, send email to
>> >> >> [email protected]
>> >> >> For more options, visit this group at
>> >> >> http://groups.google.com/group/tesseract-ocr?hl=en
>> >> >
>> >> > --
>> >> > You received this message because you are subscribed to the Google
>> >> > Groups "tesseract-ocr" group.
>> >> > To post to this group, send email to [email protected]
>> >> > To unsubscribe from this group, send email to
>> >> > [email protected]
>> >> > For more options, visit this group at
>> >> > http://groups.google.com/group/tesseract-ocr?hl=en
>> >> >
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> >> Groups "tesseract-ocr" group.
>> >> To post to this group, send email to [email protected]
>> >> To unsubscribe from this group, send email to
>> >> [email protected]
>> >> For more options, visit this group at
>> >> http://groups.google.com/group/tesseract-ocr?hl=en
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "tesseract-ocr" group.
>> > To post to this group, send email to [email protected]
>> > To unsubscribe from this group, send email to
>> > [email protected]
>> > For more options, visit this group at
>> > http://groups.google.com/group/tesseract-ocr?hl=en
>> >
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to