Hi all,

I was diligent and build a new wordlist and some new box-files. Can
you take a look on my boxes before I use them to create a new
traineddata? Because there are different fonts and because of some
letters are to close to seperate them (e.g. 's' next to 't') I
couldn't make a box for each letter as you can see here:

http://s1.directupload.net/file/d/2525/kheno9zf_jpg.htm

Is it bad that I "ignore" some characters of the original page or is
it OK? Would it be better to use a bitonal scan? And what's better,
slim boxes or boxes with some space around the letters?

Many thanks in advance (as usual;-)

Holger

@ Peter: Can you tell me of how many boxfiles the official deu-frak-
language consist. Only of the 8 deu-frak ones?


On 14 Apr., 06:32, Dmitri Silaev <[email protected]> wrote:
> There's no way to augment .traneddata files directly.
> You'll need to go through entire training procedure from scratch using
> both old and your box/tiff files.
>
> Warm regards,
> Dmitri Silaev
>
>
>
>
>
>
>
> On Thu, Apr 14, 2011 at 1:54 AM, stinguin <[email protected]> wrote:
> > Hi,
>
> > I didn't expect such a feedback - many thanks for all your answers! I
> > will try to put your tips (adding some new tif/box and editing the
> > wordlist) into practice :-) Is it possible to add my new box (or
> > better .tr-files) to theexistingdeu-frak.traineddata directly?
> > Or do I have to create a new .traineddata by using all box-files from
> > Peter and the new ones from me?
>
> > Best wishes, Holger
>
> > On 13 Apr., 19:03, Peter Alberti <[email protected]> wrote:
> >> Hi
>
> >> deu-frak.traineddata is a file I created, so I'm happy to hear that
> >> someone might want toimproveit.
> >> Actually, I've continued to work a little bit on it myself, and you
> >> can get the files I'm using from
>
> >>https://github.com/paalberti/tesseract-dan-fraktur
>
> >> The files you find there ought to be little bit better than deu-
> >> frak.traineddata available under downloads, but I haven't done any
> >> proper testing yet, so your mileage may vary. Also, the tif/box in the
> >> dan-frak/ subdirectory might work slightly better than those under deu-
> >> frak/ (Danish is the language I'm most interested in), so if you want
> >> to retrain yourself, you might to work with those.
>
> >> The two most obvious improvements, I can think of is to add to some
> >> tif/box that look more like the texts you're ocr-ing, if possible, and
> >> maybe to build a better wordlist (if I remember correctly, the German
> >> one was a little bit of quick hack.)
>
> >> Best regards, Peter.
>
> >> On 12 Apr., 22:09, stinguin <[email protected]> wrote:
>
> >> > Hi list,
>
> >> > I'm new to tesseract and hope that anyone of you could help me. I want
> >> > to ocr some german texts which are typesetted in fraktur. The results
> >> > by using theexistinglanguage "deu-frak" are good, but not good
> >> > enough. Is it possible toimprovethis language by training? If so,
> >> > can someone explain that step by step?
> >> > I just know how to create a new language. Do you think i canimprove
> >> > the results by creating my own one? I think the deu-frak-language is
> >> > more than just a few box files, isn't it?
>
> >> > Thanks in advance
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "tesseract-ocr" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to 
> > [email protected].
> > For more options, visit this group 
> > athttp://groups.google.com/group/tesseract-ocr?hl=en.

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to