[tesseract-ocr] Tesseract does not recognize monospaced font

Albrecht Hilker Wed, 27 Sep 2017 13:54:55 -0700


Hello

I have the word "CONFIGURATION" in an image.
But what Tesseract recognizes is "CONF I GURATION"

Always when there is the letter "I" or "1" in a word, Tesseract recognizes
this as 2 or even 3 words.

This happens with Tesseract 3.03 and with 3.05.
I use mode -psm 6

I have trained the traineddata on my own.
I told Tesseract that the font is monospaced, but it does not work.

My font_properties contains:
FontName@M 0 0 1 0 0

The Tesseract documentation says that Tesseract recognizes monospaced fonts.
But it does NOT.
The spaces around the "I" are wider than the spaces between the other
characters.
And Tesseract misinterprets this as the separation between two words.

Can anybody please direct into the right direction where to search ?

Is there any configuration while recognition that I must change ?
Or is there anything when building the traineddata that I must change ?

What is the name of this problem to search for further discussion about
that topic ?

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/1b5638f7-781f-4d38-b7e0-2afea3969bb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Tesseract does not recognize monospaced font

Reply via email to