Yep, this is funny, indeed ))) It's way easier to ask the creator directly, compared to self-investigation. You can forget my so valuable advices )))
Dmitri On Wed, Apr 13, 2011 at 9:03 PM, Peter Alberti <[email protected]> wrote: > Hi > > deu-frak.traineddata is a file I created, so I'm happy to hear that > someone might want to improve it. > Actually, I've continued to work a little bit on it myself, and you > can get the files I'm using from > > https://github.com/paalberti/tesseract-dan-fraktur > > The files you find there ought to be little bit better than deu- > frak.traineddata available under downloads, but I haven't done any > proper testing yet, so your mileage may vary. Also, the tif/box in the > dan-frak/ subdirectory might work slightly better than those under deu- > frak/ (Danish is the language I'm most interested in), so if you want > to retrain yourself, you might to work with those. > > The two most obvious improvements, I can think of is to add to some > tif/box that look more like the texts you're ocr-ing, if possible, and > maybe to build a better wordlist (if I remember correctly, the German > one was a little bit of quick hack.) > > Best regards, Peter. > > On 12 Apr., 22:09, stinguin <[email protected]> wrote: >> Hi list, >> >> I'm new to tesseract and hope that anyone of you could help me. I want >> to ocr some german texts which are typesetted in fraktur. The results >> by using the existing language "deu-frak" are good, but not good >> enough. Is it possible to improve this language by training? If so, >> can someone explain that step by step? >> I just know how to create a new language. Do you think i can improve >> the results by creating my own one? I think the deu-frak-language is >> more than just a few box files, isn't it? >> >> Thanks in advance > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

