Hello,everyone:
I am a new user of tesseract 4.0.Now I follow the instructions(
*https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00)*
to training lstm model.
By the way,my environment is Ubuntu16.04 and I compile the tessract 4.0 by
myself.I met some problems.
I follow these steps.
1.I run this command:
src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng
--linedata_only \
--noextract_font_properties --langdata_dir ../langdata \
--tessdata_dir ./tessdata \
--fontlist "Impact Condensed" --output_dir ~/tesstutorial/engeval
It is okay.
2.I run this command
mkdir -p ~/tesstutorial/engoutput*training/lstmtraining* --debug_interval 100 \
--traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
--net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
--model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \
--train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \
--eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
--max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log
Here,I am confused,because currently I am in the tesseract directory, *I can
not find training folder under this directory.*
and I think after I install the tesseract successfully,the system can recognize
the lstmtraining command,so I use this command instead.
*lstmtraining* --debug_interval 100 \
--traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
--net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
--model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \
--train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \
--eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
--max_iterations 5000
There is an error.
*mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file
../../src/lstm/lstmtrainer.h, line 110
Segmentation fault (core dumped)*
*I look the source code in **lstmtrainer.h*
102 // assumed that the character set is to be re-mapped from old_traineddata
to
103 // the new, with consequent change in weight matrices etc.
104 bool TryLoadingCheckpoint(const char* filename, const char*
old_traineddata);
105
106 // Initializes the character set encode/decode mechanism directly from a
107 // previously setup traineddata containing dawgs, UNICHARSET and
108 // UnicharCompress. Note: Call before InitNetwork!
109 void InitCharSet(const std::string& traineddata_path) {*110
ASSERT_HOST(mgr_.Init(traineddata_path.c_str()));*
111 InitCharSet();
112 }
113 void InitCharSet(const TessdataManager& mgr) {
114 mgr_ = mgr;
115 InitCharSet();
116 }
I don't know how to solve the problem.Is anyone can help me.Thanks in
advance.Sorry for my poor english.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/2c9ed2d4-2757-40cb-80eb-6d1439e8c9c8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.