There are new LSTM train data files available, but we need to reorganize things on GitHub to make them manageable. Right now there are 5 repositories on GitHub for tesseract-ocr. They are tesseract, langdata, tessdata, tesseract-ocr.github.io, and docs.
Please make two new ones. I suggest we call them lstm-best and lstm-fast but other choices are possible. Please give me write permissions to both repositories. I will add the files for lstm-fast once the repository is created. For lstm-best please migrate these files: https://github.com/tesseract-ocr/tessdata/tree/master/best At the end of the day, we will have three sets of .traineddata files on GitHub in three separate repositories. Most users will want LSTM Fast and that is what will be shipped as part of Linux distributions. LSTM Best is for people willing to trade a lot of speed for slightly better accuracy. It is also better for certain retraining scenarios for advanced users. The third set is for the legacy recognizer. I do not have sufficient permission to do this myself, as you can see from the attached screenshot. Thank you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5b880bf1-d391-4174-a253-06484c65fe5f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

