Hi all, I was diligent and build a new wordlist and some new box-files. Can you take a look on my boxes before I use them to create a new traineddata? Because there are different fonts and because of some letters are to close to seperate them (e.g. 's' next to 't') I couldn't make a box for each letter as you can see here:
http://s1.directupload.net/file/d/2525/kheno9zf_jpg.htm Is it bad that I "ignore" some characters of the original page or is it OK? Would it be better to use a bitonal scan? And what's better, slim boxes or boxes with some space around the letters? Many thanks in advance (as usual;-) Holger @ Peter: Can you tell me of how many boxfiles the official deu-frak- language consist. Only of the 8 deu-frak ones? On 14 Apr., 06:32, Dmitri Silaev <[email protected]> wrote: > There's no way to augment .traneddata files directly. > You'll need to go through entire training procedure from scratch using > both old and your box/tiff files. > > Warm regards, > Dmitri Silaev > > > > > > > > On Thu, Apr 14, 2011 at 1:54 AM, stinguin <[email protected]> wrote: > > Hi, > > > I didn't expect such a feedback - many thanks for all your answers! I > > will try to put your tips (adding some new tif/box and editing the > > wordlist) into practice :-) Is it possible to add my new box (or > > better .tr-files) to theexistingdeu-frak.traineddata directly? > > Or do I have to create a new .traineddata by using all box-files from > > Peter and the new ones from me? > > > Best wishes, Holger > > > On 13 Apr., 19:03, Peter Alberti <[email protected]> wrote: > >> Hi > > >> deu-frak.traineddata is a file I created, so I'm happy to hear that > >> someone might want toimproveit. > >> Actually, I've continued to work a little bit on it myself, and you > >> can get the files I'm using from > > >>https://github.com/paalberti/tesseract-dan-fraktur > > >> The files you find there ought to be little bit better than deu- > >> frak.traineddata available under downloads, but I haven't done any > >> proper testing yet, so your mileage may vary. Also, the tif/box in the > >> dan-frak/ subdirectory might work slightly better than those under deu- > >> frak/ (Danish is the language I'm most interested in), so if you want > >> to retrain yourself, you might to work with those. > > >> The two most obvious improvements, I can think of is to add to some > >> tif/box that look more like the texts you're ocr-ing, if possible, and > >> maybe to build a better wordlist (if I remember correctly, the German > >> one was a little bit of quick hack.) > > >> Best regards, Peter. > > >> On 12 Apr., 22:09, stinguin <[email protected]> wrote: > > >> > Hi list, > > >> > I'm new to tesseract and hope that anyone of you could help me. I want > >> > to ocr some german texts which are typesetted in fraktur. The results > >> > by using theexistinglanguage "deu-frak" are good, but not good > >> > enough. Is it possible toimprovethis language by training? If so, > >> > can someone explain that step by step? > >> > I just know how to create a new language. Do you think i canimprove > >> > the results by creating my own one? I think the deu-frak-language is > >> > more than just a few box files, isn't it? > > >> > Thanks in advance > > > -- > > You received this message because you are subscribed to the Google Groups > > "tesseract-ocr" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > > [email protected]. > > For more options, visit this group > > athttp://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

