Dear people,

I trained Tesseract for my font (FE-Schrift:
http://de.wikipedia.org/wiki/FE-Schrift ) and I’m getting very bad results.
I am using Tesseract 3.01 under Windows.

In this image:

https://docs.google.com/file/d/0BxkuvS_LuBAzeFNZUVA1cThLMG8/edit?usp=sharing

Where text is SAA5298 I’m getting SM529B, this is being done from inside a
program and I know that the “M” from the result is the result of the “AA”
of the source.  So, Tesseract is making a very bad segmentation of these
two characters, and even they are very good separated, as you can see.  Do
you have an idea about why is this happening ? In the other hand, is there
a way to give tesseract a hint for this (e.g., telling it the character
width).

The other problem is with this one:

https://docs.google.com/file/d/0BxkuvS_LuBAzbFk3OXNjaDR1Q1E/edit?usp=sharing

Where text is LDA6244, Tesseract is recognizing a “5” instead of a “6”,
even when the image is very good.



Here is my fonts training file:

https://docs.google.com/file/d/0BxkuvS_LuBAzczZhd21IcVlNSTQ/edit?usp=sharing

Here is my box file:

https://docs.google.com/file/d/0BxkuvS_LuBAzQV94NWdLT1VUcjQ/edit?usp=sharing

Here is my .traineddata file:

https://docs.google.com/file/d/0BxkuvS_LuBAzbkNzUmtDcE8zbjA/edit?usp=sharing

And here is a .cmd file for testing these 2 images:

https://docs.google.com/file/d/0BxkuvS_LuBAzUVVfSDhVdEUtRjA/edit?usp=sharing


Thanks,

Andres

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to