Hello everyone,

'm trying to train tesseract to improve the detection of some prices such 
as: CN¥2,400.48. I got got to a point that I keep getting this error:

*total=`cat data/all-lstmf | wc -l` \*
*   no=`echo "$total * 0.90 / 1" | bc`; \*
*   head -n "$no" data/all-lstmf > "data/list.train"*
*total=`cat data/all-lstmf | wc -l` \*
*   no=`echo "($total - $total * 0.90) / 1" | bc`; \*
*   tail -n "+$no" data/all-lstmf > "data/list.eval"*
*combine_lang_model \*
*  --input_unicharset data/unicharset \*
*  --script_dir 
/home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master 
\*
*  --words 
/home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.wordlist
 
\*
*  --numbers 
/home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.numbers
 
\*
*  --puncs 
/home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.punc
 
\*
*  --output_dir data/ \*
*  --lang eng*
*Loaded unicharset of size 113 from file data/unicharset*
*Setting unichar properties*
*Other case É of é is not in unicharset*
*Setting script properties*
*Config file is optional, continuing...*
*Failed to read data from: 
/home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.config*
*Null char=2*
*Reducing Trie to SquishedDawg*
*Reducing Trie to SquishedDawg*
*Reducing Trie to SquishedDawg*
*mkdir -p data/checkpoints*
*lstmtraining \*
*  --continue_from  
 
/home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm
 
\*
*  --old_traineddata 
/home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.traineddata
 
\*
*  --traineddata data/eng/eng.traineddata \*
*  --model_output data/checkpoints/eng \*
*  --debug_interval -1 \*
*  --train_listfile data/list.train \*
*  --eval_listfile data/list.eval \*
*  --sequential_training \*
*  --max_iterations 3000*
*Loaded file 
/home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm,
 
unpacking...*
*Warning: LSTMTrainer deserialized an LSTMRecognizer!*
*Code range changed from 111 to 112!*
*Num (Extended) outputs,weights in Series:*
*  1,36,0,1:1, 0*
*Num (Extended) outputs,weights in Series:*
*  C3,3:9, 0*
*  Ft16:16, 160*
*Total weights = 160*
*  [C3,3Ft16]:16, 160*
*  Mp3,3:16, 0*
*  Lfys64:64, 20736*
*  Lfx96:96, 61824*
*  Lrx96:96, 74112*
*  Lfx512:512, 1247232*
*  Fc112:112, 0*
*Total weights = 1404064*
*Previous null char=110 mapped to 111*
*Continuing from 
/home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm*
*Loaded 1/1 pages (1-1) of document 
/home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf*
*Loaded 1/1 pages (1-1) of document 
/home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/67e.lstmf*
*Loaded 1/1 pages (1-1) of document 
/home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/75c.lstmf*
*Loaded 1/1 pages (1-1) of document 
/home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/48b.lstmf*
*Iteration 0: ALIGNED TRUTH : CN¥2,400.48*
*Iteration 0: BEST OCR TEXT : ₩₩₩N₩₩4₩0₩0₩4₩8*
*File 
/home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf
 
page 0 :*
*!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244*
*!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244*
*!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244*
*Makefile:111: recipe for target 'data/checkpoints/eng_checkpoint' failed*
*make: *** [data/checkpoints/eng_checkpoint] Segmentation fault (core 
dumped)*

I already tried to download the best/tessdata eng.traineddata and replacing 
it in the continue_from but I haven't been able to pass this mistake. Any 
thoughts?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6152d324-0713-4de6-b646-162923273b63%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to