Hello, Can anyone show me how to keep Tesseract from misidentifying pairs of vertically aligned chars (from adjacent rows) as being single super-tall chars? I'm training with precisely the two fonts that I seek to recognize (a tall font and a short font exactly half the size of the first). Those two fonts never fall below the baseline (like y, p, etc) and are basically like chars in a 7-segment display.
Possibly helpful knowns: Those vertically joined super-tall chars exceed the known maximum (tall font) height. The misidentification comes only at the end of a line of only tall font chars, the line below it (short font chars only) beginning at the point of the misidentifications. I.e. From Tesseract's point of view while moving from left to right, the line is of more or less constant height (with the exception of special keyboard chars), when suddenly (towards the right) the line height suddenly increases (or the baseline suddenly drops below its current median). Or, is there any way to instruct Tesseract that there will be only the two font sizes to be found on any given page to be recognized (i.e. strictly limit any deviation from those two sizes and thus identify the chars properly)? Is there any way to effectively tell Tesseract that any sub-block of text will be either the tall font or the short font, but never both mixed together? Thanks, Ted -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

