There are different cases of how pixel height of a font's character
should be calculated. If you're trying to recognize a screenshot, you
may deem one pt to be equal to one pixel when typing it in Windows
Paint. However this might not be true for more complex editors like
Photoshop. Also this depends on physical size of screen's pixel and
current video mode resolution. Another case is a scanned image, here
pixel height depends on scanning resolution. Still another case, where
imho trying to relate pixel height to font's point size absolutely
lacks sense (however it is possible via some multi-parameter
formulas), is a photographic or video frame image; here pixel height
varies depending on the camera position and even can vary within a
single line of text.

All in all, Tesseract does not bother itself with DPIs, pt sizes,
etc.; only pixel size is important for recognition. You can use this
formula for scanned images to roughly determine font pixel height:

pixels = DPI * pts / 72

where pixels - pixel height to be found, DPI - scanning resolution,
pts - size of font in typographic points

However the most reliable is to scan a test page and manually count pixels.

For those willing to understand everything, here are the links:
http://en.wikipedia.org/wiki/Dots_per_inch
http://en.wikipedia.org/wiki/Point_%28typography%29
http://en.wikipedia.org/wiki/X-height

Warm regards,
Dmitri Silaev
www.CustomOCR.com





On Sat, Aug 20, 2011 at 7:48 AM, Sriranga(78yrsold)
<[email protected]> wrote:
> Dmitri,
> Thanks for the valuable guidance. I seek some clarification as follows=
> (1)"Tesseract, trained with ordinary fonts, proved good with fonts of12-64
> pixel height" it would be nice, if indicated equivalent font size for pixel
> of 12-64? For 10 or 20 pt size of the regular(ordinary) font what is the
> pixel height used in the Notepad?
> I am not programmer nor developer - as such I am seeking valuable guidance
> as user.
> BTW Is it to possible to count the pixel of any size say 20 pt of regular in
> the paint brush in which it has gird ( graph like). Just
> now I tested in paintbrush vide screenshot attached. alphabets was typed
> using Arial- 20 and  counted pixel -it has 20 pixels.
>
> Thus it is presumed that 12-64 pixel height is equivalent to 12-64 point
> size of the ordinary font - kindly confirm.
> With warmest regards,
> -sriranga(78yrs)
>
>
> On Sat, Aug 20, 2011 at 1:00 AM, Dmitri Silaev <[email protected]>
> wrote:
>>
>> The DPI measure is confusing for Tesseract's OCR, forget about it. The
>> big thing is within-image font's x-height, measured in pixels.
>> Tesseract, trained with ordinary fonts, proved good with fonts of
>> 12-64 pixel height. If you have bigger characters, scale them down. If
>> you have a font that's bold, use morphology and erode characters after
>> binarization. Experiment. Removing "greyness" won't help as it's not a
>> generic way of getting rid of uneven illumination; you need to use
>> more sophisticated algorithms. Just using Photoshop won't let you
>> achieve much.
>>
>> Warm regards,
>> Dmitri Silaev
>> www.CustomOCR.com
>>
>>
>>
>>
>>
>> On Fri, Aug 19, 2011 at 8:18 PM, Andriy Malovanyy <[email protected]>
>> wrote:
>> > To Zdenko:
>> > I think I have 3.0 version installed, so maybe I should reinstall the
>> > new version and try it. Thanks for the description of psm. Did you try
>> > to recognize other unedited images which I attached to
>> > the first post??
>> >
>> > To Rob:
>> > Initially I had 640x480 image with 72dpi with number occupying almost
>> > all the image. What I did is just opened the image in Photoshop, went
>> > to size of image menu, changed the resolution to 300 dpi (image
>> > increased in size) and set the image size back to 640x480. So, with
>> > that I got 640x480 image with 300dpi resolution.
>> >
>> > On 19 Aug, 17:56, Robert Komar <[email protected]> wrote:
>> >> On Fri, 19 Aug 2011, Andriy Malovanyy wrote:
>> >> > To sriranga:
>> >> > I tried changing dpi (check the previous post). It doesnt work.
>> >>
>> >> Did you rescale the image from 72 dpi to 300 dpi, or just change
>> >> the tag on the original image to say 300 dpi?  The latter won't work.
>> >> Tesseract seems to be tuned to work best for scans at 300 dpi
>> >> (although I've often successfully used 600 dpi).  Scans done at
>> >> 72 dpi usually get very poor results from tesseract.
>> >>
>> >> Cheers,
>> >> Rob Komar
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "tesseract-ocr" group.
>> > To post to this group, send email to [email protected]
>> > To unsubscribe from this group, send email to
>> > [email protected]
>> > For more options, visit this group at
>> > http://groups.google.com/group/tesseract-ocr?hl=en
>> >
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to