>There is now a 4.1.0 release available for tessdata_fast, tessdata and tessdata_best. See https://github.com/tesseract-ocr/tessdata_fast/issues/26#issuecomment-780127901
@Merlijn Wajer archive.org has many books which use English with diacritics for Sanskrit (IAST). You could try the models in https://github.com/Shreeshrii/tesstrain-Sanskrit-IAST for those. On Wednesday, January 27, 2021 at 3:58:27 PM UTC+5:30 Merlijn Wajer wrote: > Hi, > > With Tesseract now switching to regular (alpha) releases of 5.0.0; does > it make sense to consider some versioning for language files as well? > > The Internet Archive has switched to using Tesseract for all our OCR, > and I'm hoping that we can record exactly what version of language files > was used for a specific OCR job. Currently, the answer is simple, since > we're using the default packages from Ubuntu focal, but I am working on > switching to Tesseract release/tag 5.0.0-20201231. > > But the tessdata_fast (or tessdata_best, for that matter) do not seem to > have any recent 5.x releases: > https://github.com/tesseract-ocr/tessdata_fast/releases > > Are there plans to create a release/tag for the tessdata_* repositories? > > Cheers, > Merlijn > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b5b1d7ab-cf01-44b4-af7d-9e7c843e6064n%40googlegroups.com.