On 24 May 2010 14:46, Lars Aronsson <[email protected]> wrote:
> Peter Alberti wrote:
>
>> I've trained tesseract r319 (3.0) to support Danish texts written in
>> fraktur. It is not
>> perfect but good enough that I hope it may be useful to others.
>
> This is great! The file dan-frak.traineddata is a binary file.
> Tesseract is an open source software. Is there some
> documentation for this file format, so I can read and
> understand what's in there? I want to keep the part
> that is about fraktur/blackletter and substitute the
> part that is about Danish pre 1870 spelling for
> something based on my Swedish dictionaries.
>

With the current SVN version, you can use combine_tessdata -e
[trainingdata file] [files to extract] to extract the components you
want, and combine_tessdata [path to files] to make a new trainingdata
file.


-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to