There are new LSTM train data files available, but we need to
reorganize things on GitHub to make them manageable.
Right now there are 5 repositories on GitHub for tesseract-ocr. 
They are tesseract, langdata, tessdata, tesseract-ocr.github.io, 
and docs.

Please make two new ones. I suggest we call them 
lstm-best and lstm-fast but other choices are possible.
Please give me write permissions to both repositories.
I will add the files for lstm-fast once the repository
is created. For lstm-best please migrate these files:
https://github.com/tesseract-ocr/tessdata/tree/master/best

At the end of the day, we will have three sets of .traineddata
files on GitHub in three separate repositories. Most users
will want LSTM Fast and that is what will be shipped as
part of Linux distributions. LSTM Best is for people willing
to trade a lot of speed for slightly better accuracy. It is also
better for certain retraining scenarios for advanced users.
The third set is for the legacy recognizer.

I do not have sufficient permission to do this myself, as you
can see from the attached screenshot. Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5b880bf1-d391-4174-a253-06484c65fe5f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to