I'm using OCR-D that uses 4.0.0-beta.1

On Tuesday, July 24, 2018 at 12:05:22 AM UTC-5, shree wrote:
>
> Which version of tesseract are you using?
>
> Please post output of
>
> tesseract -v
>
> On Tue 24 Jul, 2018, 2:26 AM Emiliano Isaza Villamizar, <[email protected] 
> <javascript:>> wrote:
>
>> Hello everyone,
>>
>>
>> 'm trying to train tesseract to improve the detection of some prices such 
>> as: CN¥2,400.48. I got got to a point that I keep getting this error:
>>
>> *total=`cat data/all-lstmf | wc -l` \*
>> *   no=`echo "$total * 0.90 / 1" | bc`; \*
>> *   head -n "$no" data/all-lstmf > "data/list.train"*
>> *total=`cat data/all-lstmf | wc -l` \*
>> *   no=`echo "($total - $total * 0.90) / 1" | bc`; \*
>> *   tail -n "+$no" data/all-lstmf > "data/list.eval"*
>> *combine_lang_model \*
>> *  --input_unicharset data/unicharset \*
>> *  --script_dir 
>> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master
>>  
>> \*
>> *  --words 
>> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.wordlist
>>  
>> \*
>> *  --numbers 
>> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.numbers
>>  
>> \*
>> *  --puncs 
>> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.punc
>>  
>> \*
>> *  --output_dir data/ \*
>> *  --lang eng*
>> *Loaded unicharset of size 113 from file data/unicharset*
>> *Setting unichar properties*
>> *Other case É of é is not in unicharset*
>> *Setting script properties*
>> *Config file is optional, continuing...*
>> *Failed to read data from: 
>> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.config*
>> *Null char=2*
>> *Reducing Trie to SquishedDawg*
>> *Reducing Trie to SquishedDawg*
>> *Reducing Trie to SquishedDawg*
>> *mkdir -p data/checkpoints*
>> *lstmtraining \*
>> *  --continue_from  
>>  
>> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm
>>  
>> \*
>> *  --old_traineddata 
>> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.traineddata
>>  
>> \*
>> *  --traineddata data/eng/eng.traineddata \*
>> *  --model_output data/checkpoints/eng \*
>> *  --debug_interval -1 \*
>> *  --train_listfile data/list.train \*
>> *  --eval_listfile data/list.eval \*
>> *  --sequential_training \*
>> *  --max_iterations 3000*
>> *Loaded file 
>> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm,
>>  
>> unpacking...*
>> *Warning: LSTMTrainer deserialized an LSTMRecognizer!*
>> *Code range changed from 111 to 112!*
>> *Num (Extended) outputs,weights in Series:*
>> *  1,36,0,1:1, 0*
>> *Num (Extended) outputs,weights in Series:*
>> *  C3,3:9, 0*
>> *  Ft16:16, 160*
>> *Total weights = 160*
>> *  [C3,3Ft16]:16, 160*
>> *  Mp3,3:16, 0*
>> *  Lfys64:64, 20736*
>> *  Lfx96:96, 61824*
>> *  Lrx96:96, 74112*
>> *  Lfx512:512, 1247232*
>> *  Fc112:112, 0*
>> *Total weights = 1404064*
>> *Previous null char=110 mapped to 111*
>> *Continuing from 
>> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm*
>> *Loaded 1/1 pages (1-1) of document 
>> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf*
>> *Loaded 1/1 pages (1-1) of document 
>> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/67e.lstmf*
>> *Loaded 1/1 pages (1-1) of document 
>> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/75c.lstmf*
>> *Loaded 1/1 pages (1-1) of document 
>> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/48b.lstmf*
>> *Iteration 0: ALIGNED TRUTH : CN¥2,400.48*
>> *Iteration 0: BEST OCR TEXT : ₩₩₩N₩₩4₩0₩0₩4₩8*
>> *File 
>> /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf
>>  
>> page 0 :*
>> *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244*
>> *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244*
>> *!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244*
>> *Makefile:111: recipe for target 'data/checkpoints/eng_checkpoint' failed*
>> *make: *** [data/checkpoints/eng_checkpoint] Segmentation fault (core 
>> dumped)*
>>
>> I already tried to download the best/tessdata eng.traineddata and 
>> replacing it in the continue_from but I haven't been able to pass this 
>> mistake. Any thoughts?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/6152d324-0713-4de6-b646-162923273b63%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/6152d324-0713-4de6-b646-162923273b63%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/80e5a866-7f7f-4687-9fcd-e1fb31fad24a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to