Re: [tesseract-ocr] How to use fine tuning for training?

Shree Devi Kumar Mon, 28 Jan 2019 20:30:38 -0800

combine_tessdata -o ./tessdata/eng_new.traineddata \
~/tesstutorial/engtuned_from_eng/eng.lstm \


You need to extract eng.lstm from tessdata_best

On Tue, 29 Jan 2019, 09:37 易鑫 <[email protected] wrote:

> Hello,everyone：
>
>       Now I want to recognize  the character in the table*,y*ou can find
> the table sample in the attach file. It contains * "0123456789-.LQX" only
> 15 different characters.*
>
> So, I think using fine tuning is a good way for recognition.
>
> Here is my steps:
>
> 1.  src/training/tesstrain.sh --fonts_dir /usr/share/fonts
> --training_text *../training_data/part.txt* \
> --langdata_dir ../langdata --tessdata_dir ./tessdata --lang eng
> --linedata_only --noextract_font_properties --output_dir
> ~/tesstutorial/engtest
>
>
> *part.txt is also in the attach file.*
>
> 2.  mkdir -p ~/tesstutorial/engtuned_from_eng
> 3. lstmtraining --model_output ~/tesstutorial/engtuned_from_eng/engtuned 
> --continue_from
> ~/tesstutorial/engtuned_from_eng/eng.lstm \
> --traineddata ../tessdata/eng.traineddata --train_listfile
> ~/tesstutorial/engtest/eng.training_files.txt --max_iterations 400
>
> 4. combine_tessdata -o ./tessdata/eng_new.traineddata \
> ~/tesstutorial/engtuned_from_eng/eng.lstm \
> ~/tesstutorial/engtest/eng.lstm-number-dawg \
> ~/tesstutorial/engtest/eng.lstm-punc-dawg \
> ~/tesstutorial/engtest/eng.lstm-word-dawg
>
>
> *But when I execute  the 3rd step,there is a error.*
> Continuing from /home/yixin/tesstutorial/engtuned_from_eng/eng.lstm
> Loaded 298/298 pages (1-298) of document
> /home/yixin/tesstutorial/engtest/eng.Arial_Bold.exp0.lstmf
> Loaded 297/297 pages (1-297) of document
> /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Medium.exp0.lstmf
> Loaded 294/294 pages (1-294) of document
> /home/yixin/tesstutorial/engtest/eng.Arial.exp0.lstmf
> Loaded 293/293 pages (1-293) of document
> /home/yixin/tesstutorial/engtest/eng.Courier_New_Bold.exp0.lstmf
> Loaded 302/302 pages (1-302) of document
> /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Bold_Italic.exp0.lstmf
> Loaded 301/301 pages (1-301) of document
> /home/yixin/tesstutorial/engtest/eng.Arial_Italic.exp0.lstmf
> Loaded 301/301 pages (1-301) of document
> /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Bold.exp0.lstmf
> Loaded 302/302 pages (1-302) of document
> /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Italic.exp0.lstmf
> Loaded 302/302 pages (1-302) of document
> /home/yixin/tesstutorial/engtest/eng.Arial_Bold_Italic.exp0.lstmf
> Loaded 296/296 pages (1-296) of document
> /home/yixin/tesstutorial/engtest/eng.Courier_New_Bold_Italic.exp0.lstmf
> *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 249*
> *Segmentation fault (core dumped)*
>
> *This is the related code.*
>
>
> *248 void WeightMatrix::MatrixDotVector(const int8_t* u, double* v) const
> {249   assert(int_mode_);250   if (IntSimdMatrix::intSimdMatrix) {251
>  IntSimdMatrix::intSimdMatrix->matrixDotVectorFunction(252
>  wi_.dim1(), wi_.dim2(), &shaped_w_[0], &scales_[0], u, v);253   } else
> {254     IntSimdMatrix::MatrixDotVector(wi_, scales_, u, v);255   }256 }*
> I am a new user of lstm training, is my method is okay for recognize only
> 15 different characters, or is there any good ideas to solve this problem
> and how to solve the assert error.
>
> Thank you in advance.
>
> Sorry for my poor English.
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/d74d5f9a-31ae-4e64-b18b-59d687f02799%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/d74d5f9a-31ae-4e64-b18b-59d687f02799%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU89%3DEQOd-iLycvp3KP2yzzim3SmprBWmXL_j4%2BaBFXtQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] How to use fine tuning for training?

Reply via email to