combine_tessdata -o ./tessdata/eng_new.traineddata \ ~/tesstutorial/engtuned_from_eng/eng.lstm \
You need to extract eng.lstm from tessdata_best On Tue, 29 Jan 2019, 09:37 易鑫 <[email protected] wrote: > Hello,everyone: > > Now I want to recognize the character in the table*,y*ou can find > the table sample in the attach file. It contains * "0123456789-.LQX" only > 15 different characters.* > > So, I think using fine tuning is a good way for recognition. > > Here is my steps: > > 1. src/training/tesstrain.sh --fonts_dir /usr/share/fonts > --training_text *../training_data/part.txt* \ > --langdata_dir ../langdata --tessdata_dir ./tessdata --lang eng > --linedata_only --noextract_font_properties --output_dir > ~/tesstutorial/engtest > > > *part.txt is also in the attach file.* > > 2. mkdir -p ~/tesstutorial/engtuned_from_eng > 3. lstmtraining --model_output ~/tesstutorial/engtuned_from_eng/engtuned > --continue_from > ~/tesstutorial/engtuned_from_eng/eng.lstm \ > --traineddata ../tessdata/eng.traineddata --train_listfile > ~/tesstutorial/engtest/eng.training_files.txt --max_iterations 400 > > 4. combine_tessdata -o ./tessdata/eng_new.traineddata \ > ~/tesstutorial/engtuned_from_eng/eng.lstm \ > ~/tesstutorial/engtest/eng.lstm-number-dawg \ > ~/tesstutorial/engtest/eng.lstm-punc-dawg \ > ~/tesstutorial/engtest/eng.lstm-word-dawg > > > *But when I execute the 3rd step,there is a error.* > Continuing from /home/yixin/tesstutorial/engtuned_from_eng/eng.lstm > Loaded 298/298 pages (1-298) of document > /home/yixin/tesstutorial/engtest/eng.Arial_Bold.exp0.lstmf > Loaded 297/297 pages (1-297) of document > /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Medium.exp0.lstmf > Loaded 294/294 pages (1-294) of document > /home/yixin/tesstutorial/engtest/eng.Arial.exp0.lstmf > Loaded 293/293 pages (1-293) of document > /home/yixin/tesstutorial/engtest/eng.Courier_New_Bold.exp0.lstmf > Loaded 302/302 pages (1-302) of document > /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Bold_Italic.exp0.lstmf > Loaded 301/301 pages (1-301) of document > /home/yixin/tesstutorial/engtest/eng.Arial_Italic.exp0.lstmf > Loaded 301/301 pages (1-301) of document > /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Bold.exp0.lstmf > Loaded 302/302 pages (1-302) of document > /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Italic.exp0.lstmf > Loaded 302/302 pages (1-302) of document > /home/yixin/tesstutorial/engtest/eng.Arial_Bold_Italic.exp0.lstmf > Loaded 296/296 pages (1-296) of document > /home/yixin/tesstutorial/engtest/eng.Courier_New_Bold_Italic.exp0.lstmf > *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 249* > *Segmentation fault (core dumped)* > > *This is the related code.* > > > *248 void WeightMatrix::MatrixDotVector(const int8_t* u, double* v) const > {249 assert(int_mode_);250 if (IntSimdMatrix::intSimdMatrix) {251 > IntSimdMatrix::intSimdMatrix->matrixDotVectorFunction(252 > wi_.dim1(), wi_.dim2(), &shaped_w_[0], &scales_[0], u, v);253 } else > {254 IntSimdMatrix::MatrixDotVector(wi_, scales_, u, v);255 }256 }* > I am a new user of lstm training, is my method is okay for recognize only > 15 different characters, or is there any good ideas to solve this problem > and how to solve the assert error. > > Thank you in advance. > > Sorry for my poor English. > > > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/d74d5f9a-31ae-4e64-b18b-59d687f02799%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/d74d5f9a-31ae-4e64-b18b-59d687f02799%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU89%3DEQOd-iLycvp3KP2yzzim3SmprBWmXL_j4%2BaBFXtQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

