I'm using OCR-D that uses 4.0.0-beta.1 On Tuesday, July 24, 2018 at 12:05:22 AM UTC-5, shree wrote: > > Which version of tesseract are you using? > > Please post output of > > tesseract -v > > On Tue 24 Jul, 2018, 2:26 AM Emiliano Isaza Villamizar, <[email protected] > <javascript:>> wrote: > >> Hello everyone, >> >> >> 'm trying to train tesseract to improve the detection of some prices such >> as: CN¥2,400.48. I got got to a point that I keep getting this error: >> >> *total=`cat data/all-lstmf | wc -l` \* >> * no=`echo "$total * 0.90 / 1" | bc`; \* >> * head -n "$no" data/all-lstmf > "data/list.train"* >> *total=`cat data/all-lstmf | wc -l` \* >> * no=`echo "($total - $total * 0.90) / 1" | bc`; \* >> * tail -n "+$no" data/all-lstmf > "data/list.eval"* >> *combine_lang_model \* >> * --input_unicharset data/unicharset \* >> * --script_dir >> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master >> >> \* >> * --words >> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.wordlist >> >> \* >> * --numbers >> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.numbers >> >> \* >> * --puncs >> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.punc >> >> \* >> * --output_dir data/ \* >> * --lang eng* >> *Loaded unicharset of size 113 from file data/unicharset* >> *Setting unichar properties* >> *Other case É of é is not in unicharset* >> *Setting script properties* >> *Config file is optional, continuing...* >> *Failed to read data from: >> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.config* >> *Null char=2* >> *Reducing Trie to SquishedDawg* >> *Reducing Trie to SquishedDawg* >> *Reducing Trie to SquishedDawg* >> *mkdir -p data/checkpoints* >> *lstmtraining \* >> * --continue_from >> >> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm >> >> \* >> * --old_traineddata >> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.traineddata >> >> \* >> * --traineddata data/eng/eng.traineddata \* >> * --model_output data/checkpoints/eng \* >> * --debug_interval -1 \* >> * --train_listfile data/list.train \* >> * --eval_listfile data/list.eval \* >> * --sequential_training \* >> * --max_iterations 3000* >> *Loaded file >> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm, >> >> unpacking...* >> *Warning: LSTMTrainer deserialized an LSTMRecognizer!* >> *Code range changed from 111 to 112!* >> *Num (Extended) outputs,weights in Series:* >> * 1,36,0,1:1, 0* >> *Num (Extended) outputs,weights in Series:* >> * C3,3:9, 0* >> * Ft16:16, 160* >> *Total weights = 160* >> * [C3,3Ft16]:16, 160* >> * Mp3,3:16, 0* >> * Lfys64:64, 20736* >> * Lfx96:96, 61824* >> * Lrx96:96, 74112* >> * Lfx512:512, 1247232* >> * Fc112:112, 0* >> *Total weights = 1404064* >> *Previous null char=110 mapped to 111* >> *Continuing from >> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm* >> *Loaded 1/1 pages (1-1) of document >> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf* >> *Loaded 1/1 pages (1-1) of document >> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/67e.lstmf* >> *Loaded 1/1 pages (1-1) of document >> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/75c.lstmf* >> *Loaded 1/1 pages (1-1) of document >> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/48b.lstmf* >> *Iteration 0: ALIGNED TRUTH : CN¥2,400.48* >> *Iteration 0: BEST OCR TEXT : ₩₩₩N₩₩4₩0₩0₩4₩8* >> *File >> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf >> >> page 0 :* >> *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244* >> *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244* >> *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244* >> *Makefile:111: recipe for target 'data/checkpoints/eng_checkpoint' failed* >> *make: *** [data/checkpoints/eng_checkpoint] Segmentation fault (core >> dumped)* >> >> I already tried to download the best/tessdata eng.traineddata and >> replacing it in the continue_from but I haven't been able to pass this >> mistake. Any thoughts? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/6152d324-0713-4de6-b646-162923273b63%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/6152d324-0713-4de6-b646-162923273b63%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/80e5a866-7f7f-4687-9fcd-e1fb31fad24a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

