I've been scanning books since the 1990s and thought that OCR of Blackletter (Fraktur) was a problem that someone else would solve, so I didn't have to. For books in normal typography, I'm using ABBYY Finereader with great success. Feeling comfortable with Finereader, I have not really followed the development of the free software Tesseract. Every time I tried it, it performed worse than Finereader.
I know Finereader can be trained to read Fraktur, but this is a lot of work and only works for one Fraktur font at a time. I also know there is (has been) a special version of Finereader that reads Fraktur, and that some library projects use. Recently I tried Tesseract again, now in version 4.0, and found to my surprise that it worked quite well for Fraktur in Danish and Swedish, using the separate configuration files dan_frak and swe-frak. (The Danish version also reads Norwegian, which in the 19th century was very similar to Danish.) However: It doesn't work at all for Finnish text, and reading Swedish seems to be a lot slower than Danish. Is there anybody who knows these things and can answer how the Swedish reading of Fraktur can be improved to match the Danish, and how a Finnish version can be created? I can provide quite a lot of training data in the form of scanned books and proofread text. Is there an active mailing list or web forum for Fraktur issues with Tesseract? -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/ _______________________________________________ Wikisource-l mailing list Wikisourceemail@example.com https://lists.wikimedia.org/mailman/listinfo/wikisource-l