Yes, Tesseract is used for many Wikisource books, mainly (?) via phe's
tool https://github.com/phil-el/phetools/tree/master/hocr /
https://tools.wmflabs.org/phetools/
You can search the archives to see some things that have been tried in
the past, including http://terese.sourceforge.net/ . There are many
repositories with Indic training sets, but I never understood the
process to bring them together and make their usage wider.
Nemo
_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l