[tesseract-ocr] some questions about lstm training

易鑫 Thu, 24 Jan 2019 19:34:58 -0800

Hello,everyone:
     I am a new user of tesseract 4.0.Now  I follow the instructions(
*https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00)* 
to training lstm model.


By the way,my environment is Ubuntu16.04 and I compile the tessract 4.0 by 
myself.I met some problems.

I follow these steps.
1.I run this command:

src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng 
--linedata_only \
  --noextract_font_properties --langdata_dir ../langdata \
  --tessdata_dir ./tessdata \
  --fontlist "Impact Condensed" --output_dir ~/tesstutorial/engeval


It is okay.

2.I run this command

mkdir -p ~/tesstutorial/engoutput*training/lstmtraining* --debug_interval 100 \
  --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
  --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
  --model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \
  --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \
  --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
  --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log

Here,I am confused,because currently I am in the tesseract directory, *I can 
not find training folder under this directory.*

and I think after I install the tesseract successfully,the system can recognize 
the lstmtraining command,so I use this command instead.

*lstmtraining* --debug_interval 100 \
  --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
  --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
  --model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \
  --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \
  --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
  --max_iterations 5000

There is an error.

*mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file 
../../src/lstm/lstmtrainer.h, line 110
Segmentation fault (core dumped)*

*I look the source code in **lstmtrainer.h*

102   // assumed that the character set is to be re-mapped from old_traineddata 
to
103   // the new, with consequent change in weight matrices etc.
104   bool TryLoadingCheckpoint(const char* filename, const char* 
old_traineddata);
105 
106   // Initializes the character set encode/decode mechanism directly from a
107   // previously setup traineddata containing dawgs, UNICHARSET and
108   // UnicharCompress. Note: Call before InitNetwork!
109   void InitCharSet(const std::string& traineddata_path) {*110     
ASSERT_HOST(mgr_.Init(traineddata_path.c_str()));*
111     InitCharSet();
112   }
113   void InitCharSet(const TessdataManager& mgr) {
114     mgr_ = mgr;
115     InitCharSet();
116   }

I don't know how to solve the problem.Is anyone can help me.Thanks in 
advance.Sorry for my poor english.



-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2c9ed2d4-2757-40cb-80eb-6d1439e8c9c8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] some questions about lstm training

Reply via email to