This probably might help:

Use "combine_tessdata" to extract individual .traineddata components
and check if DAWG files are big enough. Refer to the Wiki for more
details on how to use the "combine_tessdata" utility. Also examining
extracted unicharset can be of use. The size of the entire .traneddata
should give you a clue of how many samples were used to train (compare
with the size of "eng.traineddata" which was obtained using several
tens of images of a couple hundred of characters in each).

All this can guide you to a thought on whether you can train for
fraktur better than the dev team did. The process of training is
described thoroughly in the Wiki.

Warm regards,
Dmitri Silaev





On Wed, Apr 13, 2011 at 12:09 AM, stinguin <[email protected]> wrote:
> Hi list,
>
> I'm new to tesseract and hope that anyone of you could help me. I want
> to ocr some german texts which are typesetted in fraktur. The results
> by using the existing language "deu-frak" are good, but not good
> enough. Is it possible to improve this language by training? If so,
> can someone explain that step by step?
> I just know how to create a new language. Do you think i can improve
> the results by creating my own one? I think the deu-frak-language is
> more than just a few box files, isn't it?
>
> Thanks in advance
>
> --
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to