Dear people, I trained Tesseract for my font (FE-Schrift: http://de.wikipedia.org/wiki/FE-Schrift ) and I’m getting very bad results. I am using Tesseract 3.01 under Windows.
In this image: https://docs.google.com/file/d/0BxkuvS_LuBAzeFNZUVA1cThLMG8/edit?usp=sharing Where text is SAA5298 I’m getting SM529B, this is being done from inside a program and I know that the “M” from the result is the result of the “AA” of the source. So, Tesseract is making a very bad segmentation of these two characters, and even they are very good separated, as you can see. Do you have an idea about why is this happening ? In the other hand, is there a way to give tesseract a hint for this (e.g., telling it the character width). The other problem is with this one: https://docs.google.com/file/d/0BxkuvS_LuBAzbFk3OXNjaDR1Q1E/edit?usp=sharing Where text is LDA6244, Tesseract is recognizing a “5” instead of a “6”, even when the image is very good. Here is my fonts training file: https://docs.google.com/file/d/0BxkuvS_LuBAzczZhd21IcVlNSTQ/edit?usp=sharing Here is my box file: https://docs.google.com/file/d/0BxkuvS_LuBAzQV94NWdLT1VUcjQ/edit?usp=sharing Here is my .traineddata file: https://docs.google.com/file/d/0BxkuvS_LuBAzbkNzUmtDcE8zbjA/edit?usp=sharing And here is a .cmd file for testing these 2 images: https://docs.google.com/file/d/0BxkuvS_LuBAzUVVfSDhVdEUtRjA/edit?usp=sharing Thanks, Andres -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

