Hello Tom,

Thanks for your answer.

OCRopus 0.7 doesn't need to be trained with individual characters, so you
don't really need the Tesseract training files. But you should be able to
use the scans that those files were derived from easily.

Hmm, Not really. Because my tesseract training pages are not splitted up in
pages of single lines. Or could I train ocropus with a whole page and
corresponding text? The thing is, I would use a set of training pages
without specific modifications for tesseract and ocropus, too.


> Second, the fraktur example does not support  'long-s', therefore words
>> like
>>
>> 'Wachstube' vs. 'Wachſtube' could be problematic in historical texts.
>>
> It should support long-s, but it doesn't encode it separately in the
> output.
>

That is a problem. I need the correct encoding of long-s. I want preserve
the character 'ſ' in output.  It should not be substituted with 's'. Same
for »«, „“ and so on. But that should not be a problem if I train my own
models, right?

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to