Re: [tesseract-ocr] How to use fine tuning for training?

易鑫 Mon, 28 Jan 2019 20:42:46 -0800

Thank you.I will try.
By the way,is my method feasible?I read the wiki,but I do not quite
understand "*Fine Tuning for ± a few characters". *It seems that using "*Fine
Tuning for ± a few characters*" can satisfy my need.



Shree Devi Kumar <[email protected]> 于2019年1月29日周二 下午12:30写道：

> combine_tessdata -o ./tessdata/eng_new.traineddata \
> ~/tesstutorial/engtuned_from_eng/eng.lstm \
>
> You need to extract eng.lstm from tessdata_best
>
> On Tue, 29 Jan 2019, 09:37 易鑫 <[email protected] wrote:
>
>> Hello,everyone：
>>
>>       Now I want to recognize  the character in the table*,y*ou can find
>> the table sample in the attach file. It contains * "0123456789-.LQX"
>> only 15 different characters.*
>>
>> So, I think using fine tuning is a good way for recognition.
>>
>> Here is my steps:
>>
>> 1.  src/training/tesstrain.sh --fonts_dir /usr/share/fonts
>> --training_text *../training_data/part.txt* \
>> --langdata_dir ../langdata --tessdata_dir ./tessdata --lang eng
>> --linedata_only --noextract_font_properties --output_dir
>> ~/tesstutorial/engtest
>>
>>
>> *part.txt is also in the attach file.*
>>
>> 2.  mkdir -p ~/tesstutorial/engtuned_from_eng
>> 3. lstmtraining --model_output ~/tesstutorial/engtuned_from_eng/engtuned 
>> --continue_from
>> ~/tesstutorial/engtuned_from_eng/eng.lstm \
>> --traineddata ../tessdata/eng.traineddata --train_listfile
>> ~/tesstutorial/engtest/eng.training_files.txt --max_iterations 400
>>
>> 4. combine_tessdata -o ./tessdata/eng_new.traineddata \
>> ~/tesstutorial/engtuned_from_eng/eng.lstm \
>> ~/tesstutorial/engtest/eng.lstm-number-dawg \
>> ~/tesstutorial/engtest/eng.lstm-punc-dawg \
>> ~/tesstutorial/engtest/eng.lstm-word-dawg
>>
>>
>> *But when I execute  the 3rd step,there is a error.*
>> Continuing from /home/yixin/tesstutorial/engtuned_from_eng/eng.lstm
>> Loaded 298/298 pages (1-298) of document
>> /home/yixin/tesstutorial/engtest/eng.Arial_Bold.exp0.lstmf
>> Loaded 297/297 pages (1-297) of document
>> /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Medium.exp0.lstmf
>> Loaded 294/294 pages (1-294) of document
>> /home/yixin/tesstutorial/engtest/eng.Arial.exp0.lstmf
>> Loaded 293/293 pages (1-293) of document
>> /home/yixin/tesstutorial/engtest/eng.Courier_New_Bold.exp0.lstmf
>> Loaded 302/302 pages (1-302) of document
>> /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Bold_Italic.exp0.lstmf
>> Loaded 301/301 pages (1-301) of document
>> /home/yixin/tesstutorial/engtest/eng.Arial_Italic.exp0.lstmf
>> Loaded 301/301 pages (1-301) of document
>> /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Bold.exp0.lstmf
>> Loaded 302/302 pages (1-302) of document
>> /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Italic.exp0.lstmf
>> Loaded 302/302 pages (1-302) of document
>> /home/yixin/tesstutorial/engtest/eng.Arial_Bold_Italic.exp0.lstmf
>> Loaded 296/296 pages (1-296) of document
>> /home/yixin/tesstutorial/engtest/eng.Courier_New_Bold_Italic.exp0.lstmf
>> *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 249*
>> *Segmentation fault (core dumped)*
>>
>> *This is the related code.*
>>
>>
>> *248 void WeightMatrix::MatrixDotVector(const int8_t* u, double* v) const
>> {249   assert(int_mode_);250   if (IntSimdMatrix::intSimdMatrix) {251
>>  IntSimdMatrix::intSimdMatrix->matrixDotVectorFunction(252
>>  wi_.dim1(), wi_.dim2(), &shaped_w_[0], &scales_[0], u, v);253   } else
>> {254     IntSimdMatrix::MatrixDotVector(wi_, scales_, u, v);255   }256 }*
>> I am a new user of lstm training, is my method is okay for recognize only
>> 15 different characters, or is there any good ideas to solve this problem
>> and how to solve the assert error.
>>
>> Thank you in advance.
>>
>> Sorry for my poor English.
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/d74d5f9a-31ae-4e64-b18b-59d687f02799%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/d74d5f9a-31ae-4e64-b18b-59d687f02799%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU89%3DEQOd-iLycvp3KP2yzzim3SmprBWmXL_j4%2BaBFXtQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU89%3DEQOd-iLycvp3KP2yzzim3SmprBWmXL_j4%2BaBFXtQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE205MHV1uv0KLBuHPf1wg3eCzUMn7tYYP-%3DCZprCSJu26g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] How to use fine tuning for training?

Reply via email to