Hi . Thank you for seeing my questions 1. What is difference between 'unicharset' and 'lstm-unicharset' ?
I know to make 'unicharset' by command line : "$ tesseract (lang).(filename).exp(num).tif (lang).(filename).exp(num).box But I don't know to make 'lstm-unicharset' ??? cf) .tr -> .lstmf I apply this command line = "$tesseract (lang).(filename).exp(num).tif (lang).(filename).exp(num) nobatch *box.train*" to tesseract (lang).(filename).exp(num).tif (lang).(filename).exp(num) nobatch *lstm.train*" 2. This usage is right? Is it possible to apply 'unicharset' to 'lstm-unicharset' 3. In the github wiki passage Overview of Training Process The overall training process is similar to training 3.04 <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract>. Conceptually the same: 1. Prepare training text. <https://github.com/tesseract-ocr/tesseract/issues/654#issuecomment-274574951> 2. Render text to image + box file. (Or create hand-made box files for existing image data.) 3. Make unicharset file. (Can be partially specified, ie created manually). 4. Make a starter traineddata from the unicharset and optional dictionary data. <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#creating-starter-traineddata> 5. Run tesseract to process image + box file to make training data set. 6. Run training on training data set. 7. Combine data files. The key differences are: - The boxes only need to be at the *textline level.* It is thus *far easier* to make training data from existing image data. - The .tr files are replaced by .lstmf data files. - Fonts *can and should be mixed freely* instead of being separate. - The clustering steps (mftraining, cntraining, shapeclustering) are replaced with a single slow lstmtraining step. I think that In The key differecen section "unicharset" are replace by "lstm-unicharset" - sentence is added Am I false???? I wait everybody's answers Thank U. Have a nice day! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5730b272-043b-4abe-8d85-b8f4d96aad33%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.