Rather than using random web resources, I'd suggest using the official documentation. The most relevant section is probably this: https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#fine-tuning-for--a-few-characters
I would suggest starting with script/Latin for your base model, which will at least give you š ž to start with. In addition to the consonants with dots above and below, it looks like there's also a funny Epsilon style character that you may want to train (perhaps similar to https://unicodeplus.com/U+0190). You may also want to think about whether it'd be better to train with synthetic rendered lines of text or line images chopped out of your page scans with associated ground truth text. If you decide to go with the latter approach, looking at what the Fraktur OCR project did for training may be useful https://github.com/tesseract-ocr/tesstrain/wiki/GT4HistOCR Good luck! Tom -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ebdd6a43-ff6c-433f-be22-7e6e4d47387bn%40googlegroups.com.

