I think this is a mistake in naming the variable CONTINUE_FROM . here it is just a variable.
I renamed it but it still the same error. basically, i think the problem with combine_tessdata . combine_tessdata -u /mnt/e/projects/Training_Tesseract/ocrd-train/usr/share/ tessdata/foo.traineddata /mnt/e/projects/Training_Tesseract/ocrd-train/usr/ share/tessdata/foo. Failed to read /mnt/e/projects/Training_Tesseract/ocrd-train/usr/share/ tessdata/foo.traineddata Makefile:98: recipe for target 'data/unicharset' failed On Sunday, July 29, 2018 at 6:19:49 PM UTC+2, shree wrote: > > Continue_from should be used when you want to train a new language based > on an existing language or to add some characters to an existing language. > > There is no existing language called 'foo' - you should replace it with > the lang code for the language you are training. > > On Sun, Jul 29, 2018 at 9:44 PM <[email protected] <javascript:>> > wrote: > >> I duplicated the tessdata and still getting this error >> >> combine_tessdata -u /mnt/e/projects/Training_Tesseract/ocrd-train/usr/ >> share/tessdata/foo.traineddata /mnt/e/projects/Training_Tesseract/ocrd- >> train/usr/share/tessdata/foo. >> Failed to read /mnt/e/projects/Training_Tesseract/ocrd-train/usr/share/ >> tessdata/foo.traineddata >> Makefile:97: recipe for target 'data/unicharset' failed >> >> I can't found the foo.traineddata in this folder. >> >> >> >> >> On Sunday, July 29, 2018 at 5:19:05 PM UTC+2, chandra churh chatterjee >> wrote: >>> >>> keep the foo.traineddata inside the tessdata folder and then run the >>> command. >>> >>> On Sun, Jul 29, 2018 at 5:00 AM <[email protected]> wrote: >>> >>>> I am using a bash script to train LSTM model. I have the images and box >>>> file. >>>> >>>> >>>> My problem is the error returns when the command combine_tessdata >>>> executed . also i have checked and no file called foo.traineddata created. >>>> >>>> >>>> Here is the bash code . >>>> export >>>> >>>> >>>> SHELL := /bin/bash >>>> LOCAL := $(PWD)/usr >>>> PATH := $(LOCAL)/bin:$(PATH) >>>> TESSDATA = /usr/share/tesseract-ocr/tessdata >>>> LANGDATA = $(PWD)/langdata >>>> >>>> >>>> # Name of the model to be built. Default: $(MODEL_NAME) >>>> MODEL_NAME = foo >>>> >>>> >>>> # Name of the model to continue from. Default: $(CONTINUE_FROM) >>>> CONTINUE_FROM = $(MODEL_NAME) >>>> >>>> >>>> # No of cores to use for compiling leptonica/tesseract. Default: >>>> $(CORES) >>>> CORES = 4 >>>> >>>> >>>> # Leptonica version. Default: $(LEPTONICA_VERSION) >>>> LEPTONICA_VERSION := 1.75.3 >>>> >>>> >>>> # Tesseract commit. Default: $(TESSERACT_VERSION) >>>> TESSERACT_VERSION := 9ae97508aed1e5508458f1181b08501f984bf4e2 >>>> >>>> >>>> # Tesseract langdata version. Default: $(LANGDATA_VERSION) >>>> LANGDATA_VERSION := master >>>> >>>> >>>> # Tesseract model repo to use. Default: $(TESSDATA_REPO) >>>> TESSDATA_REPO = _fast >>>> >>>> >>>> # Train directory. Default: $(TRAIN) >>>> TRAIN := data/train >>>> >>>> >>>> # Normalization Mode - see src/training/language_specific.sh for >>>> details. Default: $(NORM_MODE) >>>> NORM_MODE = 2 >>>> >>>> >>>> # Page segmentation mode. Default: $(PSM) >>>> PSM = 6 >>>> >>>> >>>> # Ratio of train / eval training data. Default: $(RATIO_TRAIN) >>>> RATIO_TRAIN := 0.90 >>>> >>>> >>>> # BEGIN-EVAL makefile-parser --make-help Makefile >>>> >>>> >>>> help: >>>> @echo "" >>>> @echo " Targets" >>>> @echo "" >>>> @echo " unicharset Create unicharset" >>>> @echo " lists Create lists of lstmf filenames for >>>> training and eval" >>>> @echo " training Start training" >>>> @echo " proto-model Build the proto model" >>>> @echo " leptonica Build leptonica" >>>> @echo " tesseract Build tesseract" >>>> @echo " tesseract-langs Download tesseract-langs" >>>> @echo " langdata Download langdata" >>>> @echo " clean Clean all generated files" >>>> @echo "" >>>> @echo " Variables" >>>> @echo "" >>>> @echo " MODEL_NAME Name of the model to be built. Default: >>>> $(MODEL_NAME)" >>>> @echo " CONTINUE_FROM Name of the model to continue from. >>>> Default: $(CONTINUE_FROM)" >>>> @echo " CORES No of cores to use for compiling >>>> leptonica/tesseract. Default: $(CORES)" >>>> @echo " LEPTONICA_VERSION Leptonica version. Default: >>>> $(LEPTONICA_VERSION)" >>>> @echo " TESSERACT_VERSION Tesseract commit. Default: >>>> $(TESSERACT_VERSION)" >>>> @echo " LANGDATA_VERSION Tesseract langdata version. Default: >>>> $(LANGDATA_VERSION)" >>>> @echo " TESSDATA_REPO Tesseract model repo to use. Default: >>>> $(TESSDATA_REPO)" >>>> @echo " TRAIN Train directory. Default: $(TRAIN)" >>>> @echo " NORM_MODE Normalization Mode - see >>>> src/training/language_specific.sh for details. Default: $(NORM_MODE)" >>>> @echo " PSM Page segmentation mode. Default: $(PSM)" >>>> @echo " RATIO_TRAIN Ratio of train / eval training data. >>>> Default: $(RATIO_TRAIN)" >>>> >>>> >>>> # END-EVAL >>>> >>>> >>>> ALL_BOXES = data/all-boxes >>>> ALL_LSTMF = data/all-lstmf >>>> >>>> >>>> # Create unicharset >>>> unicharset: data/unicharset >>>> >>>> >>>> # Create lists of lstmf filenames for training and eval >>>> lists: $(ALL_LSTMF) data/list.train data/list.eval >>>> >>>> >>>> data/list.train: $(ALL_LSTMF) >>>> total=`cat $(ALL_LSTMF) | wc -l` \ >>>> no=`echo "$$total * $(RATIO_TRAIN) / 1" | bc`; \ >>>> head -n "$$no" $(ALL_LSTMF) > "$@" >>>> >>>> >>>> data/list.eval: $(ALL_LSTMF) >>>> total=`cat $(ALL_LSTMF) | wc -l` \ >>>> no=`echo "($$total - $$total * $(RATIO_TRAIN)) / 1" | bc`; \ >>>> tail -n "+$$no" $(ALL_LSTMF) > "$@" >>>> >>>> >>>> # Start training >>>> training: data/$(MODEL_NAME).traineddata >>>> >>>> >>>> data/unicharset: $(ALL_BOXES) >>>> combine_tessdata -u $(TESSDATA)/$(CONTINUE_FROM).traineddata $( >>>> TESSDATA)/$(CONTINUE_FROM). >>>> unicharset_extractor --output_unicharset "$(TRAIN)/my.unicharset" >>>> --norm_mode >>>> $(NORM_MODE) "$(ALL_BOXES)" >>>> merge_unicharsets $(TESSDATA)/$(CONTINUE_FROM).lstm-unicharset $(TRAIN >>>> )/my.unicharset "$@" >>>> >>>> >>>> $(ALL_BOXES): $(sort $(patsubst %.tif,%.box,$(wildcard $(TRAIN) >>>> /*.tif))) >>>> find $(TRAIN) -name '*.box' -exec cat {} \; > "$@" >>>> >>>> >>>> $(TRAIN)/%.box: $(TRAIN)/%.tif $(TRAIN)/%.gt.txt >>>> python3 generate_line_box.py -i "$(TRAIN)/$*.tif" -t >>>> "$(TRAIN)/$*.gt.txt" > "$@" >>>> >>>> >>>> $(ALL_LSTMF): $(sort $(patsubst %.tif,%.lstmf,$(wildcard >>>> $(TRAIN)/*.tif))) >>>> find $(TRAIN) -name '*.lstmf' -exec echo {} \; | sort -R -o "$@" >>>> >>>> >>>> $(TRAIN)/%.lstmf: $(TRAIN)/%.box >>>> tesseract $(TRAIN)/$*.tif $(TRAIN)/$* --psm $(PSM) lstm.train >>>> >>>> >>>> # Build the proto model >>>> proto-model: data/$(MODEL_NAME)/$(MODEL_NAME).traineddata >>>> >>>> >>>> data/$(MODEL_NAME)/$(MODEL_NAME).traineddata: $(LANGDATA) >>>> data/unicharset >>>> combine_lang_model \ >>>> --input_unicharset data/unicharset \ >>>> --script_dir $(LANGDATA) \ >>>> --output_dir data/ \ >>>> --lang $(MODEL_NAME) >>>> >>>> >>>> data/checkpoints/$(MODEL_NAME)_checkpoint: unicharset lists proto-model >>>> mkdir -p data/checkpoints >>>> lstmtraining \ >>>> --traineddata data/$(MODEL_NAME)/$(MODEL_NAME).traineddata \ >>>> --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 >>>> O1c`head -n1 data/unicharset`]" \ >>>> --model_output data/checkpoints/$(MODEL_NAME) \ >>>> --learning_rate 20e-4 \ >>>> --train_listfile data/list.train \ >>>> --eval_listfile data/list.eval \ >>>> --max_iterations 10000 >>>> >>>> >>>> data/$(MODEL_NAME).traineddata: >>>> data/checkpoints/$(MODEL_NAME)_checkpoint >>>> lstmtraining \ >>>> --stop_training \ >>>> --continue_from $^ \ >>>> --traineddata data/$(MODEL_NAME)/$(MODEL_NAME).traineddata \ >>>> --model_output $@ >>>> >>>> >>>> # Build leptonica >>>> leptonica: leptonica.built >>>> >>>> >>>> leptonica.built: leptonica-$(LEPTONICA_VERSION) >>>> cd $< ; \ >>>> ./configure --prefix=$(LOCAL) && \ >>>> make -j$(CORES) && \ >>>> make install && \ >>>> date > "$@" >>>> >>>> >>>> leptonica-$(LEPTONICA_VERSION): leptonica-$(LEPTONICA_VERSION).tar.gz >>>> tar xf "$<" >>>> >>>> >>>> leptonica-$(LEPTONICA_VERSION).tar.gz: >>>> wget 'http://www.leptonica.org/source/$@' >>>> >>>> >>>> # Build tesseract >>>> tesseract: tesseract.built tesseract-langs >>>> >>>> >>>> tesseract.built: tesseract-$(TESSERACT_VERSION) >>>> cd $< && \ >>>> sh autogen.sh && \ >>>> PKG_CONFIG_PATH="$(LOCAL)/lib/pkgconfig" \ >>>> LEPTONICA_CFLAGS="-I$(LOCAL)/include/leptonica" \ >>>> ./configure --prefix=$(LOCAL) && \ >>>> LDFLAGS="-L$(LOCAL)/lib"\ >>>> make -j$(CORES) && \ >>>> make install && \ >>>> make -j$(CORES) training-install && \ >>>> date > "$@" >>>> >>>> >>>> tesseract-$(TESSERACT_VERSION): >>>> wget >>>> https://github.com/tesseract-ocr/tesseract/archive/$(TESSERACT_VERSION).zip >>>> unzip $(TESSERACT_VERSION).zip >>>> >>>> >>>> # Download tesseract-langs >>>> tesseract-langs: $(TESSDATA)/eng.traineddata >>>> >>>> >>>> # Download langdata >>>> langdata: $(LANGDATA) >>>> >>>> >>>> $(LANGDATA): >>>> #wget ' >>>> https://github.com/tesseract-ocr/langdata/archive/$(LANGDATA_VERSION).zip >>>> ' >>>> unzip $(LANGDATA_VERSION).zip >>>> >>>> >>>> $(TESSDATA)/eng.traineddata: >>>> cd $(TESSDATA) && wget >>>> https://github.com/tesseract-ocr/tessdata$(TESSDATA_REPO)/raw/master/$(notdir >>>> >>>> $@) >>>> >>>> >>>> # Clean all generated files >>>> clean: >>>> find data/train -name '*.box' -delete >>>> find data/train -name '*.lstmf' -delete >>>> rm -rf data/all-* >>>> rm -rf data/list.* >>>> rm -rf data/$(MODEL_NAME) >>>> rm -rf data/unicharset >>>> rm -rf data/checkpoints >>>> >>>> >>>> Also here is the error >>>> >>>> >>>> combine_tessdata -u /usr/share/tesseract-ocr/tessdata/foo.traineddata >>>> /usr/share/tesseract-ocr/tessdata/foo. >>>> Failed to read /usr/share/tesseract-ocr/tessdata/foo.traineddata >>>> Makefile:97: recipe for target 'data/unicharset' failed >>>> make: *** [data/unicharset] Error 1 >>>> >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/964f8a60-ec0e-44d9-a6a2-1b81eb49ab2b%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/964f8a60-ec0e-44d9-a6a2-1b81eb49ab2b%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/ac3496b9-899d-4590-a015-1adc2de0327d%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/ac3496b9-899d-4590-a015-1adc2de0327d%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/38757f42-5e1f-4b25-87f5-38a60bff2f50%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

