cheers
that was easy!!
many thanks
I wonder if Z will now change the FAQ to tell ppl to use an image
program to do the measuring?
Cheers
[email protected]
32 Hawera Rd
Kohimarama 1071
Auckland, New Zealand
+64 (0)9 528 1174 home
+64 (0)226 710 335 cell
http://kmccready.wordpress.com
On 13/11/12 08:20, Sven Pedersen wrote:
Measure the height of a lower case 'x' in your image using an image
program, such as Gimp or the standard image viewer on your platform
(such as Windows Paint or Mac Preview).
If the height of a lower-case 'x' in your text is less than 20 pixels,
you need to resize it or rescan your documents.
--Sven
On Mon, Nov 12, 2012 at 10:40 AM, chikev <[email protected]
<mailto:[email protected]>> wrote:
I'd be grateful if someone could help me here.
Here is my request to Zdenko and the reply.
Could you perhaps help me understand, and then change the
page, the meaning of:
"A quick check is to count the pixels of the x-height of your
characters. (X-height is the height of the lower case x.)"
I have no idea what this means or how to do it.
Well then it would better if you find something else than
tesseract. Honestly. You will be lost and disappointed with
tesseract because tesseract requires some knowledge (e.g. from
image processing). It could be compared to university - if you got
there it is expected that you finished your studies
in high-school. Nobody there will bother to explain you basis...
IMO there can not be clearer definition of x-height and what to do
with it. BTW it is in FAQ and you complain about wrong information
in Compilation wiki ;-)
Here is what the FAQ says:
There is a minimum text size for reasonable accuracy. You have to
consider resolution as well as point size. Accuracy drops off
below 10pt x 300dpi, rapidly below 8pt x 300dpi. A quick check is
to count the pixels of the x-height of your characters. (X-height
is the height of the lower case x.) At 10pt x 300dpi x-heights are
typically about 20 pixels, although this can vary dramatically
from font to font. Below an x-height of 10 pixels, you have very
little chance of accurate results, and below about 8 pixels, most
of the text will be "noise removed".
So if someone could help me, I'm sure I wouldn't be the only one
to benefit.
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to
[email protected] <mailto:[email protected]>
To unsubscribe from this group, send email to
[email protected]
<mailto:tesseract-ocr%[email protected]>
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
--
``All that is gold does not glitter,
not all those who wander are lost;
the old that is strong does not wither,
deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
a light from the shadows shall spring;
renewed shall be blade that was broken,
the crownless again shall be king.”
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en