Andres,

Above all, your first link seem to be pointing to a "traineddata" file
instead of an image. Second, without actually diving deep into your
problem, I can suggest specifying the single line psm mode in the
command line. And finally you can use the user patterns feature to
restrict possible output of Tesseract (for the format see comments in
dict/trie.h on read_pattern_list()). Another way of achieving the
latter, like we do in CustomOCR, and it seems to be more reliable, is
to use the API to get a number of of character variants for each blob
alng with confidence levels and match them against a set of possible
patterns. You can find how to do this by searching around this forum.

HTH and good luck with Tesseract!

Warm regards,
Dmitri Silaev
www.CustomOCR.com


On Fri, May 3, 2013 at 8:24 PM, Andres <[email protected]> wrote:
> Dear people,
>
> I trained Tesseract for my font (FE-Schrift:
> http://de.wikipedia.org/wiki/FE-Schrift ) and I’m getting very bad results.
> I am using Tesseract 3.01 under Windows.
>
> In this image:
>
> https://docs.google.com/file/d/0BxkuvS_LuBAzeFNZUVA1cThLMG8/edit?usp=sharing
>
> Where text is SAA5298 I’m getting SM529B, this is being done from inside a
> program and I know that the “M” from the result is the result of the “AA” of
> the source.  So, Tesseract is making a very bad segmentation of these two
> characters, and even they are very good separated, as you can see.  Do you
> have an idea about why is this happening ? In the other hand, is there a way
> to give tesseract a hint for this (e.g., telling it the character width).
>
> The other problem is with this one:
>
> https://docs.google.com/file/d/0BxkuvS_LuBAzbFk3OXNjaDR1Q1E/edit?usp=sharing
>
> Where text is LDA6244, Tesseract is recognizing a “5” instead of a “6”, even
> when the image is very good.
>
>
>
> Here is my fonts training file:
>
> https://docs.google.com/file/d/0BxkuvS_LuBAzczZhd21IcVlNSTQ/edit?usp=sharing
>
> Here is my box file:
>
> https://docs.google.com/file/d/0BxkuvS_LuBAzQV94NWdLT1VUcjQ/edit?usp=sharing
>
> Here is my .traineddata file:
>
> https://docs.google.com/file/d/0BxkuvS_LuBAzbkNzUmtDcE8zbjA/edit?usp=sharing
>
> And here is a .cmd file for testing these 2 images:
>
> https://docs.google.com/file/d/0BxkuvS_LuBAzUVVfSDhVdEUtRjA/edit?usp=sharing
>
>
>
> Thanks,
>
> Andres
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to