Re: How to improve an existing language?

Peter Alberti Wed, 13 Apr 2011 10:43:33 -0700

Hi

deu-frak.traineddata is a file I created, so I'm happy to hear that
someone might want to improve it.
Actually, I've continued to work a little bit on it myself, and you
can get the files I'm using from


https://github.com/paalberti/tesseract-dan-fraktur

The files you find there ought to be little bit better than deu-
frak.traineddata available under downloads, but I haven't done any
proper testing yet, so your mileage may vary. Also, the tif/box in the
dan-frak/ subdirectory might work slightly better than those under deu-
frak/ (Danish is the language I'm most interested in), so if you want
to retrain yourself, you might to work with those.

The two most obvious improvements, I can think of is to add to some
tif/box that look more like the texts you're ocr-ing, if possible, and
maybe to build a better wordlist (if I remember correctly, the German
one was a little bit of quick hack.)

Best regards, Peter.

On 12 Apr., 22:09, stinguin <[email protected]> wrote:
> Hi list,
>
> I'm new to tesseract and hope that anyone of you could help me. I want
> to ocr some german texts which are typesetted in fraktur. The results
> by using the existing language "deu-frak" are good, but not good
> enough. Is it possible to improve this language by training? If so,
> can someone explain that step by step?
> I just know how to create a new language. Do you think i can improve
> the results by creating my own one? I think the deu-frak-language is
> more than just a few box files, isn't it?
>
> Thanks in advance

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: How to improve an existing language?

Reply via email to