This probably might help: Use "combine_tessdata" to extract individual .traineddata components and check if DAWG files are big enough. Refer to the Wiki for more details on how to use the "combine_tessdata" utility. Also examining extracted unicharset can be of use. The size of the entire .traneddata should give you a clue of how many samples were used to train (compare with the size of "eng.traineddata" which was obtained using several tens of images of a couple hundred of characters in each).
All this can guide you to a thought on whether you can train for fraktur better than the dev team did. The process of training is described thoroughly in the Wiki. Warm regards, Dmitri Silaev On Wed, Apr 13, 2011 at 12:09 AM, stinguin <[email protected]> wrote: > Hi list, > > I'm new to tesseract and hope that anyone of you could help me. I want > to ocr some german texts which are typesetted in fraktur. The results > by using the existing language "deu-frak" are good, but not good > enough. Is it possible to improve this language by training? If so, > can someone explain that step by step? > I just know how to create a new language. Do you think i can improve > the results by creating my own one? I think the deu-frak-language is > more than just a few box files, isn't it? > > Thanks in advance > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

