Hello everyone,
'm trying to train tesseract to improve the detection of some prices such as: CN¥2,400.48. I got got to a point that I keep getting this error: *total=`cat data/all-lstmf | wc -l` \* * no=`echo "$total * 0.90 / 1" | bc`; \* * head -n "$no" data/all-lstmf > "data/list.train"* *total=`cat data/all-lstmf | wc -l` \* * no=`echo "($total - $total * 0.90) / 1" | bc`; \* * tail -n "+$no" data/all-lstmf > "data/list.eval"* *combine_lang_model \* * --input_unicharset data/unicharset \* * --script_dir /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master \* * --words /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.wordlist \* * --numbers /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.numbers \* * --puncs /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.punc \* * --output_dir data/ \* * --lang eng* *Loaded unicharset of size 113 from file data/unicharset* *Setting unichar properties* *Other case É of é is not in unicharset* *Setting script properties* *Config file is optional, continuing...* *Failed to read data from: /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.config* *Null char=2* *Reducing Trie to SquishedDawg* *Reducing Trie to SquishedDawg* *Reducing Trie to SquishedDawg* *mkdir -p data/checkpoints* *lstmtraining \* * --continue_from /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm \* * --old_traineddata /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.traineddata \* * --traineddata data/eng/eng.traineddata \* * --model_output data/checkpoints/eng \* * --debug_interval -1 \* * --train_listfile data/list.train \* * --eval_listfile data/list.eval \* * --sequential_training \* * --max_iterations 3000* *Loaded file /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm, unpacking...* *Warning: LSTMTrainer deserialized an LSTMRecognizer!* *Code range changed from 111 to 112!* *Num (Extended) outputs,weights in Series:* * 1,36,0,1:1, 0* *Num (Extended) outputs,weights in Series:* * C3,3:9, 0* * Ft16:16, 160* *Total weights = 160* * [C3,3Ft16]:16, 160* * Mp3,3:16, 0* * Lfys64:64, 20736* * Lfx96:96, 61824* * Lrx96:96, 74112* * Lfx512:512, 1247232* * Fc112:112, 0* *Total weights = 1404064* *Previous null char=110 mapped to 111* *Continuing from /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm* *Loaded 1/1 pages (1-1) of document /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf* *Loaded 1/1 pages (1-1) of document /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/67e.lstmf* *Loaded 1/1 pages (1-1) of document /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/75c.lstmf* *Loaded 1/1 pages (1-1) of document /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/48b.lstmf* *Iteration 0: ALIGNED TRUTH : CN¥2,400.48* *Iteration 0: BEST OCR TEXT : ₩₩₩N₩₩4₩0₩0₩4₩8* *File /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf page 0 :* *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244* *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244* *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244* *Makefile:111: recipe for target 'data/checkpoints/eng_checkpoint' failed* *make: *** [data/checkpoints/eng_checkpoint] Segmentation fault (core dumped)* I already tried to download the best/tessdata eng.traineddata and replacing it in the continue_from but I haven't been able to pass this mistake. Any thoughts? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6152d324-0713-4de6-b646-162923273b63%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

