Hello Tom, Thanks for your answer.
OCRopus 0.7 doesn't need to be trained with individual characters, so you don't really need the Tesseract training files. But you should be able to use the scans that those files were derived from easily. Hmm, Not really. Because my tesseract training pages are not splitted up in pages of single lines. Or could I train ocropus with a whole page and corresponding text? The thing is, I would use a set of training pages without specific modifications for tesseract and ocropus, too. > Second, the fraktur example does not support 'long-s', therefore words >> like >> >> 'Wachstube' vs. 'Wachſtube' could be problematic in historical texts. >> > It should support long-s, but it doesn't encode it separately in the > output. > That is a problem. I need the correct encoding of long-s. I want preserve the character 'ſ' in output. It should not be substituted with 's'. Same for »«, „“ and so on. But that should not be a problem if I train my own models, right? -- You received this message because you are subscribed to the Google Groups "ocropus" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
