I have done this and the same error appear. My Problem with the BOX File Generation for a whole page
On Friday, August 17, 2018 at 2:09:53 PM UTC+2, shree wrote: > > Please build the latest code beta.4 and run the same test. > > On Fri, Aug 17, 2018 at 4:44 PM, <[email protected] <javascript:>> > wrote: > >> ### Environment >> >> * **Tesseract Version**: 4.0.0-beta.1-306-g45b11cd >> * **Commit Number**: 4.0.0-beta.1-306-g45b11cd >> * **Platform**: Ubuntu x86_64 GNU/Linux >> ### Current Behavior: >> >> Infinite loop of Compute CTC targets failed >> >> I have a box file and tif images and i run the below script for training. >> ``` >> >> ALL_BOXES = data/all-boxes >> ALL_LSTMF = data/all-lstmf >> >> # Create unicharset >> unicharset: data/unicharset >> >> # Create lists of lstmf filenames for training and eval >> lists: $(ALL_LSTMF) data/list.train data/list.eval >> >> data/list.train: $(ALL_LSTMF) >> total=`cat $(ALL_LSTMF) | wc -l` \ >> no=`echo "$$total * $(RATIO_TRAIN) / 1" | bc`; \ >> head -n "$$no" $(ALL_LSTMF) > "$@" >> >> data/list.eval: $(ALL_LSTMF) >> total=`cat $(ALL_LSTMF) | wc -l` \ >> no=`echo "($$total - $$total * $(RATIO_TRAIN)) / 1" | bc`; \ >> tail -n "+$$no" $(ALL_LSTMF) > "$@" >> >> # Start training >> training: data/$(MODEL_NAME).traineddata >> >> data/unicharset: $(ALL_BOXES) >> >> combine_tessdata -u $(TESSDATA)/eng.traineddata >> $(TESSDATA)/$(MODEL_NAME). >> unicharset_extractor --output_unicharset "$(TRAIN)/my.unicharset" >> --norm_mode $(NORM_MODE) "$(ALL_BOXES)" >> merge_unicharsets $(TESSDATA)/$(MODEL_NAME).lstm-unicharset >> $(TRAIN)/my.unicharset "$@" >> >> $(ALL_BOXES): $(sort $(patsubst %.tif,%.box,$(wildcard $(TRAIN)/*.tif))) >> find $(TRAIN) -name '*.box' -exec cat {} \; > "$@" >> >> #$(TRAIN)/%.box: $(TRAIN)/%.tif $(TRAIN)/%.gt.txt >> #python3 generate_line_box.py -i "$(TRAIN)/$*.tif" -t >> "$(TRAIN)/$*.gt.txt" > "$@" >> >> $(ALL_LSTMF): $(sort $(patsubst %.tif,%.lstmf,$(wildcard $(TRAIN)/*.tif))) >> find $(TRAIN) -name '*.lstmf' -exec echo {} \; | sort -R -o "$@" >> >> $(TRAIN)/%.lstmf: $(TRAIN)/%.box >> tesseract $(TRAIN)/$*.tif $(TRAIN)/$* --psm $(PSM) lstm.train >> >> # Build the proto model >> proto-model: data/$(MODEL_NAME)/$(MODEL_NAME).traineddata >> >> data/$(MODEL_NAME)/$(MODEL_NAME).traineddata: $(LANGDATA) data/unicharset >> combine_lang_model \ >> --input_unicharset data/unicharset \ >> --script_dir $(LANGDATA) \ >> --output_dir data/ \ >> --lang $(MODEL_NAME) >> >> data/checkpoints/$(MODEL_NAME)_checkpoint: unicharset lists proto-model >> mkdir -p data/checkpoints >> lstmtraining \ >> --traineddata data/$(MODEL_NAME)/$(MODEL_NAME).traineddata \ >> --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c`head >> -n1 data/unicharset`]" \ >> --model_output data/checkpoints/$(MODEL_NAME) \ >> --learning_rate 20e-4 \ >> --train_listfile data/list.train \ >> --eval_listfile data/list.eval \ >> --max_iterations 10000 >> ``` >> >> Here is the Logs including the error . >> >> ``` >> find data/train -name '*.box' -exec cat {} \; > "data/all-boxes" >> #python3 generate_line_box.py -i "data/train/.tif" -t >> "data/train/.gt.txt" > "data/all-boxes" >> combine_tessdata -u >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/eng.traineddata >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice. >> Version >> string:4.00.00alpha:eng:synth20170629:[1,36,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx512O1c1] >> 1:unicharset:size=7477, offset=192 >> 2:unicharambigs:size=1047, offset=7669 >> 3:inttemp:size=976552, offset=8716 >> 4:pffmtable:size=844, offset=985268 >> 5:normproto:size=13408, offset=986112 >> 6:punc-dawg:size=4322, offset=999520 >> 7:word-dawg:size=1082890, offset=1003842 >> 8:number-dawg:size=6426, offset=2086732 >> 9:freq-dawg:size=1410, offset=2093158 >> 13:shapetable:size=63346, offset=2094568 >> 14:bigram-dawg:size=16109842, offset=2157914 >> 17:lstm:size=1487588, offset=18267756 >> 18:lstm-punc-dawg:size=4322, offset=19755344 >> 19:lstm-word-dawg:size=3694794, offset=19759666 >> 20:lstm-number-dawg:size=4738, offset=23454460 >> 21:lstm-unicharset:size=6360, offset=23459198 >> 22:lstm-recoder:size=1012, offset=23465558 >> 23:version:size=80, offset=23466570 >> Extracting tessdata components from >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/eng.traineddata >> Wrote >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.unicharset >> Wrote >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.unicharambigs >> Wrote >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.inttemp >> Wrote >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.pffmtable >> Wrote >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.normproto >> Wrote >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.punc-dawg >> Wrote >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.word-dawg >> Wrote >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.number-dawg >> Wrote >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.freq-dawg >> Wrote >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.shapetable >> Wrote >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.bigram-dawg >> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm >> Wrote >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm-punc-dawg >> Wrote >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm-word-dawg >> Wrote >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm-number-dawg >> Wrote >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm-unicharset >> Wrote >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm-recoder >> Wrote >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.version >> unicharset_extractor --output_unicharset "data/train/my.unicharset" >> --norm_mode 2 "data/all-boxes" >> Extracting unicharset from box file data/all-boxes >> Other case f of F is not in unicharset >> Other case d of D is not in unicharset >> Other case h of H is not in unicharset >> Other case z of Z is not in unicharset >> Other case k of K is not in unicharset >> Other case w of W is not in unicharset >> Other case v of V is not in unicharset >> Other case j of J is not in unicharset >> Other case b of B is not in unicharset >> Other case q of Q is not in unicharset >> Wrote unicharset file data/train/my.unicharset >> merge_unicharsets >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm-unicharset >> >> data/train/my.unicharset "data/unicharset" >> Loaded unicharset of size 112 from file >> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm-unicharset >> Loaded unicharset of size 62 from file data/train/my.unicharset >> Wrote unicharset file data/unicharset. >> tesseract data/train/10_0.tif data/train/10_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/11_0.tif data/train/11_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/12_0.tif data/train/12_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/13_0.tif data/train/13_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/14_0.tif data/train/14_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/15_0.tif data/train/15_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/16_0.tif data/train/16_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/17_0.tif data/train/17_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/18_0.tif data/train/18_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/19_0.tif data/train/19_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/1_0.tif data/train/1_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/20_0.tif data/train/20_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/21_0.tif data/train/21_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/22_0.tif data/train/22_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/23_0.tif data/train/23_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/24_0.tif data/train/24_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/25_0.tif data/train/25_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/26_0.tif data/train/26_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/27_0.tif data/train/27_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/28_0.tif data/train/28_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/29_0.tif data/train/29_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/2_0.tif data/train/2_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/30_0.tif data/train/30_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/31_0.tif data/train/31_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/32_0.tif data/train/32_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/33_0.tif data/train/33_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/3_0.tif data/train/3_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/4_0.tif data/train/4_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/5_0.tif data/train/5_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/6_0.tif data/train/6_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/7_0.tif data/train/7_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/8_0.tif data/train/8_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> tesseract data/train/9_0.tif data/train/9_0 --psm 6 lstm.train >> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica >> Page 1 >> Warning. Invalid resolution 0 dpi. Using 70 instead. >> find data/train -name '*.lstmf' -exec echo {} \; | sort -R -o >> "data/all-lstmf" >> total=`cat data/all-lstmf | wc -l` \ >> no=`echo "$total * 0.90 / 1" | bc`; \ >> head -n "$no" data/all-lstmf > "data/list.train" >> total=`cat data/all-lstmf | wc -l` \ >> no=`echo "($total - $total * 0.90) / 1" | bc`; \ >> tail -n "+$no" data/all-lstmf > "data/list.eval" >> combine_lang_model \ >> --input_unicharset data/unicharset \ >> --script_dir /mnt/Training_Tesseract/ocrd-train/langdata-master \ >> --output_dir data/ \ >> --lang Invoice >> Loaded unicharset of size 112 from file data/unicharset >> Setting unichar properties >> Other case É of é is not in unicharset >> Setting script properties >> Config file is optional, continuing... >> Failed to read data from: >> /mnt/Training_Tesseract/ocrd-train/langdata-master/Invoice/Invoice.config >> Null char=2 >> mkdir -p data/checkpoints >> lstmtraining \ >> --traineddata data/Invoice/Invoice.traineddata \ >> --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c`head >> -n1 data/unicharset`]" \ >> --model_output data/checkpoints/Invoice \ >> --learning_rate 20e-4 \ >> --train_listfile data/list.train \ >> --eval_listfile data/list.eval \ >> --max_iterations 10000 >> Warning: given outputs 112 not equal to unicharset of 111. >> Num outputs,weights in Series: >> 1,36,0,1:1, 0 >> Num outputs,weights in Series: >> C3,3:9, 0 >> Ft16:16, 160 >> Total weights = 160 >> [C3,3Ft16]:16, 160 >> Mp3,3:16, 0 >> Lfys48:48, 12480 >> Lfx96:96, 55680 >> Lrx96:96, 74112 >> Lfx256:256, 361472 >> Fc111:111, 28527 >> Total weights = 532431 >> Built network:[1,36,0,1[C3,3Ft16]Mp3,3Lfys48Lfx96Lrx96Lfx256Fc111] from >> request [1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c112] >> Training parameters: >> Debug interval = 0, weights = 0.1, learning rate = 0.002, momentum=0.5 >> null char=110 >> Loaded 1/1 pages (1-1) of document data/train/14_0.lstmf >> Loaded 1/1 pages (1-1) of document data/train/22_0.lstmf >> Loaded 1/1 pages (1-1) of document data/train/8_0.lstmf >> Loaded 1/1 pages (1-1) of document data/train/24_0.lstmf >> Loaded 1/1 pages (1-1) of document data/train/33_0.lstmf >> Loaded 1/1 pages (1-1) of document data/train/2_0.lstmf >> Loaded 1/1 pages (1-1) of document data/train/17_0.lstmf >> Loaded 1/1 pages (1-1) of document data/train/20_0.lstmf >> Loaded 1/1 pages (1-1) of document data/train/12_0.lstmf >> Loaded 1/1 pages (1-1) of document data/train/33_0.lstmf >> Loaded 1/1 pages (1-1) of document data/train/24_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/13_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/21_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/11_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/31_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/16_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/6_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/25_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/5_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/26_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/18_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/7_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/9_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/30_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/32_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/29_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/28_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/1_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/15_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/27_0.lstmf >> Compute CTC targets failed! >> Loaded 1/1 pages (1-1) of document data/train/19_0.lstmf >> Compute CTC targets failed! >> Compute CTC targets failed! >> Compute CTC targets failed! >> Compute CTC targets failed! >> Compute CTC targets failed! >> Compute CTC targets failed! >> Compute CTC targets failed! >> Compute CTC targets failed! >> >> ``` >> I have added a sample for the training data >> [train.zip]( >> https://github.com/tesseract-ocr/tesseract/files/2296961/train.zip) >> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/4ed1ddd8-35b8-48ba-bb1d-154354518296%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/4ed1ddd8-35b8-48ba-bb1d-154354518296%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/577268f1-5f7a-4972-8a67-cc056468ef57%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

