Hello,

Can anyone show me how to keep Tesseract from misidentifying pairs of 
vertically aligned chars (from adjacent rows) as being single super-tall 
chars?  I'm training with precisely the two fonts that I seek to recognize 
(a tall font and a short font exactly half the size of the first). Those 
two fonts never fall below the baseline (like y, p, etc) and are basically 
like chars in a 7-segment display.

Possibly helpful knowns:  Those vertically joined super-tall chars exceed 
the known maximum (tall font) height.  The misidentification comes only at 
the end of a line of only tall font chars, the line below it (short font 
chars only) beginning at the point of the misidentifications.  I.e.  From 
Tesseract's point of view while moving from left to right, the line is of 
more or less constant height (with the exception of special keyboard 
chars), when suddenly (towards the right) the line height suddenly 
increases (or the baseline suddenly drops below its current median).  

Or, is there any way to instruct Tesseract that there will be only the two 
font sizes to be found on any given page to be recognized (i.e. strictly 
limit any deviation from those two sizes and thus identify the chars 
properly)?  Is there any way to effectively tell Tesseract that any 
sub-block of text will be either the tall font or the short font, but never 
both mixed together?

Thanks,
  Ted

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to