Thank you.I will try. By the way,is my method feasible?I read the wiki,but I do not quite understand "*Fine Tuning for ± a few characters". *It seems that using "*Fine Tuning for ± a few characters*" can satisfy my need.
Shree Devi Kumar <[email protected]> 于2019年1月29日周二 下午12:30写道: > combine_tessdata -o ./tessdata/eng_new.traineddata \ > ~/tesstutorial/engtuned_from_eng/eng.lstm \ > > You need to extract eng.lstm from tessdata_best > > On Tue, 29 Jan 2019, 09:37 易鑫 <[email protected] wrote: > >> Hello,everyone: >> >> Now I want to recognize the character in the table*,y*ou can find >> the table sample in the attach file. It contains * "0123456789-.LQX" >> only 15 different characters.* >> >> So, I think using fine tuning is a good way for recognition. >> >> Here is my steps: >> >> 1. src/training/tesstrain.sh --fonts_dir /usr/share/fonts >> --training_text *../training_data/part.txt* \ >> --langdata_dir ../langdata --tessdata_dir ./tessdata --lang eng >> --linedata_only --noextract_font_properties --output_dir >> ~/tesstutorial/engtest >> >> >> *part.txt is also in the attach file.* >> >> 2. mkdir -p ~/tesstutorial/engtuned_from_eng >> 3. lstmtraining --model_output ~/tesstutorial/engtuned_from_eng/engtuned >> --continue_from >> ~/tesstutorial/engtuned_from_eng/eng.lstm \ >> --traineddata ../tessdata/eng.traineddata --train_listfile >> ~/tesstutorial/engtest/eng.training_files.txt --max_iterations 400 >> >> 4. combine_tessdata -o ./tessdata/eng_new.traineddata \ >> ~/tesstutorial/engtuned_from_eng/eng.lstm \ >> ~/tesstutorial/engtest/eng.lstm-number-dawg \ >> ~/tesstutorial/engtest/eng.lstm-punc-dawg \ >> ~/tesstutorial/engtest/eng.lstm-word-dawg >> >> >> *But when I execute the 3rd step,there is a error.* >> Continuing from /home/yixin/tesstutorial/engtuned_from_eng/eng.lstm >> Loaded 298/298 pages (1-298) of document >> /home/yixin/tesstutorial/engtest/eng.Arial_Bold.exp0.lstmf >> Loaded 297/297 pages (1-297) of document >> /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Medium.exp0.lstmf >> Loaded 294/294 pages (1-294) of document >> /home/yixin/tesstutorial/engtest/eng.Arial.exp0.lstmf >> Loaded 293/293 pages (1-293) of document >> /home/yixin/tesstutorial/engtest/eng.Courier_New_Bold.exp0.lstmf >> Loaded 302/302 pages (1-302) of document >> /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Bold_Italic.exp0.lstmf >> Loaded 301/301 pages (1-301) of document >> /home/yixin/tesstutorial/engtest/eng.Arial_Italic.exp0.lstmf >> Loaded 301/301 pages (1-301) of document >> /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Bold.exp0.lstmf >> Loaded 302/302 pages (1-302) of document >> /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Italic.exp0.lstmf >> Loaded 302/302 pages (1-302) of document >> /home/yixin/tesstutorial/engtest/eng.Arial_Bold_Italic.exp0.lstmf >> Loaded 296/296 pages (1-296) of document >> /home/yixin/tesstutorial/engtest/eng.Courier_New_Bold_Italic.exp0.lstmf >> *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 249* >> *Segmentation fault (core dumped)* >> >> *This is the related code.* >> >> >> *248 void WeightMatrix::MatrixDotVector(const int8_t* u, double* v) const >> {249 assert(int_mode_);250 if (IntSimdMatrix::intSimdMatrix) {251 >> IntSimdMatrix::intSimdMatrix->matrixDotVectorFunction(252 >> wi_.dim1(), wi_.dim2(), &shaped_w_[0], &scales_[0], u, v);253 } else >> {254 IntSimdMatrix::MatrixDotVector(wi_, scales_, u, v);255 }256 }* >> I am a new user of lstm training, is my method is okay for recognize only >> 15 different characters, or is there any good ideas to solve this problem >> and how to solve the assert error. >> >> Thank you in advance. >> >> Sorry for my poor English. >> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/d74d5f9a-31ae-4e64-b18b-59d687f02799%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/d74d5f9a-31ae-4e64-b18b-59d687f02799%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU89%3DEQOd-iLycvp3KP2yzzim3SmprBWmXL_j4%2BaBFXtQ%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU89%3DEQOd-iLycvp3KP2yzzim3SmprBWmXL_j4%2BaBFXtQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE205MHV1uv0KLBuHPf1wg3eCzUMn7tYYP-%3DCZprCSJu26g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

