On 24 May 2010 14:46, Lars Aronsson <[email protected]> wrote: > Peter Alberti wrote: > >> I've trained tesseract r319 (3.0) to support Danish texts written in >> fraktur. It is not >> perfect but good enough that I hope it may be useful to others. > > This is great! The file dan-frak.traineddata is a binary file. > Tesseract is an open source software. Is there some > documentation for this file format, so I can read and > understand what's in there? I want to keep the part > that is about fraktur/blackletter and substitute the > part that is about Danish pre 1870 spelling for > something based on my Swedish dictionaries. >
With the current SVN version, you can use combine_tessdata -e [trainingdata file] [files to extract] to extract the components you want, and combine_tessdata [path to files] to make a new trainingdata file. -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

