Hello


I have the word "CONFIGURATION" in an image.
But what Tesseract recognizes is "CONF I GURATION"


Always when there is the letter "I" or "1" in a word, Tesseract recognizes 
this as 2 or even 3 words.


This happens with Tesseract 3.03 and with 3.05.
I use mode -psm 6


I have trained the traineddata on my own.
I told Tesseract that the font is monospaced, but it does not work.


My font_properties contains:
FontName@M 0 0 1 0 0


The Tesseract documentation says that Tesseract recognizes monospaced fonts.
But it does NOT.
The spaces around the "I" are wider than the spaces between the other 
characters.
And Tesseract misinterprets this as the separation between two words.

Can anybody please direct into the right direction where to search ?


Is there any configuration while recognition that I must change ?
Or is there anything when building the traineddata that I must change ?


What is the name of this problem to search for further discussion about 
that topic ?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1b5638f7-781f-4d38-b7e0-2afea3969bb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to