Please build the latest code beta.4 and run the same test.

On Fri, Aug 17, 2018 at 4:44 PM, <[email protected]> wrote:

> ### Environment
>
> * **Tesseract Version**: 4.0.0-beta.1-306-g45b11cd
> * **Commit Number**: 4.0.0-beta.1-306-g45b11cd
> * **Platform**: Ubuntu x86_64 GNU/Linux
> ### Current Behavior:
>
> Infinite loop of Compute CTC targets failed
>
> I have a box file and tif images and i run the below script for training.
> ```
>
> ALL_BOXES = data/all-boxes
> ALL_LSTMF = data/all-lstmf
>
> # Create unicharset
> unicharset: data/unicharset
>
> # Create lists of lstmf filenames for training and eval
> lists: $(ALL_LSTMF) data/list.train data/list.eval
>
> data/list.train: $(ALL_LSTMF)
> total=`cat $(ALL_LSTMF) | wc -l` \
>    no=`echo "$$total * $(RATIO_TRAIN) / 1" | bc`; \
>    head -n "$$no" $(ALL_LSTMF) > "$@"
>
> data/list.eval: $(ALL_LSTMF)
> total=`cat $(ALL_LSTMF) | wc -l` \
>    no=`echo "($$total - $$total * $(RATIO_TRAIN)) / 1" | bc`; \
>    tail -n "+$$no" $(ALL_LSTMF) > "$@"
>
> # Start training
> training: data/$(MODEL_NAME).traineddata
>
> data/unicharset: $(ALL_BOXES)
>
> combine_tessdata -u $(TESSDATA)/eng.traineddata  $(TESSDATA)/$(MODEL_NAME).
> unicharset_extractor --output_unicharset "$(TRAIN)/my.unicharset"
> --norm_mode $(NORM_MODE) "$(ALL_BOXES)"
> merge_unicharsets $(TESSDATA)/$(MODEL_NAME).lstm-unicharset
> $(TRAIN)/my.unicharset  "$@"
>
> $(ALL_BOXES): $(sort $(patsubst %.tif,%.box,$(wildcard $(TRAIN)/*.tif)))
> find $(TRAIN) -name '*.box' -exec cat {} \; > "$@"
>
> #$(TRAIN)/%.box: $(TRAIN)/%.tif $(TRAIN)/%.gt.txt
> #python3 generate_line_box.py -i "$(TRAIN)/$*.tif" -t "$(TRAIN)/$*.gt.txt"
> > "$@"
>
> $(ALL_LSTMF): $(sort $(patsubst %.tif,%.lstmf,$(wildcard $(TRAIN)/*.tif)))
> find $(TRAIN) -name '*.lstmf' -exec echo {} \; | sort -R -o "$@"
>
> $(TRAIN)/%.lstmf: $(TRAIN)/%.box
> tesseract $(TRAIN)/$*.tif $(TRAIN)/$* --psm $(PSM) lstm.train
>
> # Build the proto model
> proto-model: data/$(MODEL_NAME)/$(MODEL_NAME).traineddata
>
> data/$(MODEL_NAME)/$(MODEL_NAME).traineddata: $(LANGDATA) data/unicharset
> combine_lang_model \
>   --input_unicharset data/unicharset \
>   --script_dir $(LANGDATA) \
>   --output_dir data/ \
>   --lang $(MODEL_NAME)
>
> data/checkpoints/$(MODEL_NAME)_checkpoint: unicharset lists proto-model
> mkdir -p data/checkpoints
> lstmtraining \
>   --traineddata data/$(MODEL_NAME)/$(MODEL_NAME).traineddata \
>   --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c`head
> -n1 data/unicharset`]" \
>   --model_output data/checkpoints/$(MODEL_NAME) \
>   --learning_rate 20e-4 \
>   --train_listfile data/list.train \
>   --eval_listfile data/list.eval \
>   --max_iterations 10000
> ```
>
> Here is the Logs including the error .
>
> ```
> find data/train -name '*.box' -exec cat {} \; > "data/all-boxes"
> #python3 generate_line_box.py -i "data/train/.tif" -t "data/train/.gt.txt"
> > "data/all-boxes"
> combine_tessdata -u /mnt/Training_Tesseract/ocrd-
> train/usr/share/tessdata/eng.traineddata  /mnt/Training_Tesseract/ocrd-
> train/usr/share/tessdata/Invoice.
> Version string:4.00.00alpha:eng:synth20170629:[1,36,0,1Ct3,3,16Mp3,
> 3Lfys64Lfx96Lrx96Lfx512O1c1]
> 1:unicharset:size=7477, offset=192
> 2:unicharambigs:size=1047, offset=7669
> 3:inttemp:size=976552, offset=8716
> 4:pffmtable:size=844, offset=985268
> 5:normproto:size=13408, offset=986112
> 6:punc-dawg:size=4322, offset=999520
> 7:word-dawg:size=1082890, offset=1003842
> 8:number-dawg:size=6426, offset=2086732
> 9:freq-dawg:size=1410, offset=2093158
> 13:shapetable:size=63346, offset=2094568
> 14:bigram-dawg:size=16109842, offset=2157914
> 17:lstm:size=1487588, offset=18267756
> 18:lstm-punc-dawg:size=4322, offset=19755344
> 19:lstm-word-dawg:size=3694794, offset=19759666
> 20:lstm-number-dawg:size=4738, offset=23454460
> 21:lstm-unicharset:size=6360, offset=23459198
> 22:lstm-recoder:size=1012, offset=23465558
> 23:version:size=80, offset=23466570
> Extracting tessdata components from /mnt/Training_Tesseract/ocrd-
> train/usr/share/tessdata/eng.traineddata
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/
> Invoice.unicharset
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/
> Invoice.unicharambigs
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/
> Invoice.inttemp
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/
> Invoice.pffmtable
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/
> Invoice.normproto
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/
> Invoice.punc-dawg
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/
> Invoice.word-dawg
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/
> Invoice.number-dawg
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/
> Invoice.freq-dawg
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/
> Invoice.shapetable
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/
> Invoice.bigram-dawg
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/
> Invoice.lstm-punc-dawg
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/
> Invoice.lstm-word-dawg
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/
> Invoice.lstm-number-dawg
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/
> Invoice.lstm-unicharset
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/
> Invoice.lstm-recoder
> Wrote /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/
> Invoice.version
> unicharset_extractor --output_unicharset "data/train/my.unicharset"
> --norm_mode 2 "data/all-boxes"
> Extracting unicharset from box file data/all-boxes
> Other case f of F is not in unicharset
> Other case d of D is not in unicharset
> Other case h of H is not in unicharset
> Other case z of Z is not in unicharset
> Other case k of K is not in unicharset
> Other case w of W is not in unicharset
> Other case v of V is not in unicharset
> Other case j of J is not in unicharset
> Other case b of B is not in unicharset
> Other case q of Q is not in unicharset
> Wrote unicharset file data/train/my.unicharset
> merge_unicharsets 
> /mnt/Training_Tesseract/ocrd-train/usr/share/tessdata/Invoice.lstm-unicharset
> data/train/my.unicharset  "data/unicharset"
> Loaded unicharset of size 112 from file /mnt/Training_Tesseract/ocrd-
> train/usr/share/tessdata/Invoice.lstm-unicharset
> Loaded unicharset of size 62 from file data/train/my.unicharset
> Wrote unicharset file data/unicharset.
> tesseract data/train/10_0.tif data/train/10_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/11_0.tif data/train/11_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/12_0.tif data/train/12_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/13_0.tif data/train/13_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/14_0.tif data/train/14_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/15_0.tif data/train/15_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/16_0.tif data/train/16_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/17_0.tif data/train/17_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/18_0.tif data/train/18_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/19_0.tif data/train/19_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/1_0.tif data/train/1_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/20_0.tif data/train/20_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/21_0.tif data/train/21_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/22_0.tif data/train/22_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/23_0.tif data/train/23_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/24_0.tif data/train/24_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/25_0.tif data/train/25_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/26_0.tif data/train/26_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/27_0.tif data/train/27_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/28_0.tif data/train/28_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/29_0.tif data/train/29_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/2_0.tif data/train/2_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/30_0.tif data/train/30_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/31_0.tif data/train/31_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/32_0.tif data/train/32_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/33_0.tif data/train/33_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/3_0.tif data/train/3_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/4_0.tif data/train/4_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/5_0.tif data/train/5_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/6_0.tif data/train/6_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/7_0.tif data/train/7_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/8_0.tif data/train/8_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> tesseract data/train/9_0.tif data/train/9_0 --psm 6 lstm.train
> Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
> Page 1
> Warning. Invalid resolution 0 dpi. Using 70 instead.
> find data/train -name '*.lstmf' -exec echo {} \; | sort -R -o
> "data/all-lstmf"
> total=`cat data/all-lstmf | wc -l` \
>    no=`echo "$total * 0.90 / 1" | bc`; \
>    head -n "$no" data/all-lstmf > "data/list.train"
> total=`cat data/all-lstmf | wc -l` \
>    no=`echo "($total - $total * 0.90) / 1" | bc`; \
>    tail -n "+$no" data/all-lstmf > "data/list.eval"
> combine_lang_model \
>   --input_unicharset data/unicharset \
>   --script_dir /mnt/Training_Tesseract/ocrd-train/langdata-master \
>   --output_dir data/ \
>   --lang Invoice
> Loaded unicharset of size 112 from file data/unicharset
> Setting unichar properties
> Other case É of é is not in unicharset
> Setting script properties
> Config file is optional, continuing...
> Failed to read data from: /mnt/Training_Tesseract/ocrd-
> train/langdata-master/Invoice/Invoice.config
> Null char=2
> mkdir -p data/checkpoints
> lstmtraining \
>   --traineddata data/Invoice/Invoice.traineddata \
>   --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c`head
> -n1 data/unicharset`]" \
>   --model_output data/checkpoints/Invoice \
>   --learning_rate 20e-4 \
>   --train_listfile data/list.train \
>   --eval_listfile data/list.eval \
>   --max_iterations 10000
> Warning: given outputs 112 not equal to unicharset of 111.
> Num outputs,weights in Series:
>   1,36,0,1:1, 0
> Num outputs,weights in Series:
>   C3,3:9, 0
>   Ft16:16, 160
> Total weights = 160
>   [C3,3Ft16]:16, 160
>   Mp3,3:16, 0
>   Lfys48:48, 12480
>   Lfx96:96, 55680
>   Lrx96:96, 74112
>   Lfx256:256, 361472
>   Fc111:111, 28527
> Total weights = 532431
> Built network:[1,36,0,1[C3,3Ft16]Mp3,3Lfys48Lfx96Lrx96Lfx256Fc111] from
> request [1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c112]
> Training parameters:
>   Debug interval = 0, weights = 0.1, learning rate = 0.002, momentum=0.5
> null char=110
> Loaded 1/1 pages (1-1) of document data/train/14_0.lstmf
> Loaded 1/1 pages (1-1) of document data/train/22_0.lstmf
> Loaded 1/1 pages (1-1) of document data/train/8_0.lstmf
> Loaded 1/1 pages (1-1) of document data/train/24_0.lstmf
> Loaded 1/1 pages (1-1) of document data/train/33_0.lstmf
> Loaded 1/1 pages (1-1) of document data/train/2_0.lstmf
> Loaded 1/1 pages (1-1) of document data/train/17_0.lstmf
> Loaded 1/1 pages (1-1) of document data/train/20_0.lstmf
> Loaded 1/1 pages (1-1) of document data/train/12_0.lstmf
> Loaded 1/1 pages (1-1) of document data/train/33_0.lstmf
> Loaded 1/1 pages (1-1) of document data/train/24_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/13_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/21_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/11_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/31_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/16_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/6_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/25_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/5_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/26_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/18_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/7_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/9_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/30_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/32_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/29_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/28_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/1_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/15_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/27_0.lstmf
> Compute CTC targets failed!
> Loaded 1/1 pages (1-1) of document data/train/19_0.lstmf
> Compute CTC targets failed!
> Compute CTC targets failed!
> Compute CTC targets failed!
> Compute CTC targets failed!
> Compute CTC targets failed!
> Compute CTC targets failed!
> Compute CTC targets failed!
> Compute CTC targets failed!
>
> ```
> I have added a sample for the training data
> [train.zip](https://github.com/tesseract-ocr/tesseract/
> files/2296961/train.zip)
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/4ed1ddd8-35b8-48ba-bb1d-154354518296%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/4ed1ddd8-35b8-48ba-bb1d-154354518296%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXb_f1%3DKy1H1v-TTVfBbnAdm3Wv%3DwrhiPEnc2mSRxQiNg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to