What o/s are you running it on? Which version of tesseract?
> ICU ERROR: U_FILE_ACCESS_ERRORERROR: /tmp/tmp.6m4B2TUln1/eng/eng.unicharset does not exist or is not readable which version of icu library? ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, May 15, 2018 at 1:00 PM, reza <reza6...@gmail.com> wrote: > i used this attached finetune.sh file ... but that raised error. could u > help me ? > > thanks > > >> ###### MAKING TRAINING DATA ###### >> >> >>> === Starting training for language 'eng' >> >> [Tue, May 15, 2018 11:42:36 AM] /c/Program Files >>> (x86)/Tesseract-OCR/text2image --fonts_dir=C:WindowsFonts --font=Arial >>> --outputbase=/tmp/font_tmp.CpgpM0lbxD/sample_text.txt >>> --text=/tmp/font_tmp.CpgpM0lbxD/sample_text.txt >>> --fontconfig_tmpdir=/tmp/font_tmp.CpgpM0lbxD >> >> Rendered page 0 to file C:/Users/asus/AppData/Local/ >>> Temp/font_tmp.CpgpM0lbxD/sample_text.txt.tif >> >> >>> === Phase I: Generating training images === >> >> Rendering using Arial >> >> Rendering using Corbel >> >> [Tue, May 15, 2018 11:42:37 AM] /c/Program Files >>> (x86)/Tesseract-OCR/text2image --fontconfig_tmpdir=/tmp/font_tmp.CpgpM0lbxD >>> --fonts_dir=C:WindowsFonts --strip_unrenderable_words --leading=32 >>> --char_spacing=0.0 --exposure=0 >>> --outputbase=/tmp/tmp.6m4B2TUln1/eng/eng.Arial.exp0 >>> --max_pages=3 --font=Arial --text=./langdata/eng/eng.training_text >> >> [Tue, May 15, 2018 11:42:37 AM] /c/Program Files >>> (x86)/Tesseract-OCR/text2image --fontconfig_tmpdir=/tmp/font_tmp.CpgpM0lbxD >>> --fonts_dir=C:WindowsFonts --strip_unrenderable_words --leading=32 >>> --char_spacing=0.0 --exposure=0 >>> --outputbase=/tmp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0 >>> --max_pages=3 --font=Corbel --text=./langdata/eng/eng.training_text >> >> Stripped 2 unrenderable words >> >> Rendered page 0 to file C:/Users/asus/AppData/Local/ >>> Temp/tmp.6m4B2TUln1/eng/eng.Arial.exp0.tif >> >> Stripped 1 unrenderable words >> >> Rendered page 1 to file C:/Users/asus/AppData/Local/ >>> Temp/tmp.6m4B2TUln1/eng/eng.Arial.exp0.tif >> >> Stripped 2 unrenderable words >> >> Rendered page 0 to file C:/Users/asus/AppData/Local/ >>> Temp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0.tif >> >> Stripped 1 unrenderable words >> >> Rendered page 1 to file C:/Users/asus/AppData/Local/ >>> Temp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0.tif >> >> >>> === Phase UP: Generating unicharset and unichar properties files === >> >> [Tue, May 15, 2018 11:42:39 AM] /c/Program Files >> (x86)/Tesseract-OCR/unicharset_extractor >>> --output_unicharset /tmp/tmp.6m4B2TUln1/eng/eng.unicharset --norm_mode >>> 1 /tmp/tmp.6m4B2TUln1/eng/eng.Arial.exp0.box >>> /tmp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0.box >> >> Extracting unicharset from box file C:/Users/asus/AppData/Local/ >>> Temp/tmp.6m4B2TUln1/eng/eng.Arial.exp0.box >> >> Extracting unicharset from box file C:/Users/asus/AppData/Local/ >>> Temp/tmp.6m4B2TUln1/eng/eng.Corbel.exp0.box >> >> ICU ERROR: U_FILE_ACCESS_ERRORERROR: /tmp/tmp.6m4B2TUln1/eng/eng.unicharset >>> does not exist or is not readable >> >> ###### MAKING EVAL DATA ###### >> >> >>> === Starting training for language 'eng' >> >> [Tue, May 15, 2018 11:42:40 AM] /c/Program Files >>> (x86)/Tesseract-OCR/text2image --fonts_dir=C:WindowsFonts --font=Calibri >>> --outputbase=/tmp/font_tmp.n0qq4iJk4q/sample_text.txt >>> --text=/tmp/font_tmp.n0qq4iJk4q/sample_text.txt >>> --fontconfig_tmpdir=/tmp/font_tmp.n0qq4iJk4q >> >> Rendered page 0 to file C:/Users/asus/AppData/Local/ >>> Temp/font_tmp.n0qq4iJk4q/sample_text.txt.tif >> >> >>> === Phase I: Generating training images === >> >> Rendering using Calibri >> >> [Tue, May 15, 2018 11:42:40 AM] /c/Program Files >>> (x86)/Tesseract-OCR/text2image --fontconfig_tmpdir=/tmp/font_tmp.n0qq4iJk4q >>> --fonts_dir=C:WindowsFonts --strip_unrenderable_words --leading=32 >>> --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp. >>> h0l64TAxEq/eng/eng.Calibri.exp0 --max_pages=3 --font=Calibri >>> --text=./langdata/eng/eng.training_text >> >> Stripped 2 unrenderable words >> >> Rendered page 0 to file C:/Users/asus/AppData/Local/ >>> Temp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0.tif >> >> Stripped 1 unrenderable words >> >> Rendered page 1 to file C:/Users/asus/AppData/Local/ >>> Temp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0.tif >> >> >>> === Phase UP: Generating unicharset and unichar properties files === >> >> [Tue, May 15, 2018 11:42:42 AM] /c/Program Files >> (x86)/Tesseract-OCR/unicharset_extractor >>> --output_unicharset /tmp/tmp.h0l64TAxEq/eng/eng.unicharset --norm_mode >>> 1 /tmp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0.box >> >> Extracting unicharset from box file C:/Users/asus/AppData/Local/ >>> Temp/tmp.h0l64TAxEq/eng/eng.Calibri.exp0.box >> >> ICU ERROR: U_FILE_ACCESS_ERRORERROR: /tmp/tmp.h0l64TAxEq/eng/eng.unicharset >>> does not exist or is not readable >> >> #### combine_tessdata to extract lstm model from previous trained set #### >> >> Extracting tessdata components from ./tessdata_best/eng.traineddata >> >> Wrote ./trained_plus_chars/eng.lstm >> >> Version string:4.00.00alpha:eng:synth20170629 >> >> 17:lstm:size=401636, offset=192 >> >> 18:lstm-punc-dawg:size=4322, offset=401828 >> >> 19:lstm-word-dawg:size=3694794, offset=406150 >> >> 20:lstm-number-dawg:size=4738, offset=4100944 >> >> 21:lstm-unicharset:size=6360, offset=4105682 >> >> 22:lstm-recoder:size=1012, offset=4112042 >> >> 23:version:size=30, offset=4113054 >> >> #### training from previous optimum ##### >> >> finetune.sh: line 119: 11664 Segmentation fault lstmtraining >>> --model_output $train_output_dir/pluschars --continue_from >>> $train_output_dir/$Lang.lstm --old_traineddata >>> $tessdata_dir/$Lang.traineddata >>> --traineddata $train_output_dir/$Lang/$Lang.traineddata >>> --max_iterations $MaxIterations --debug_interval -1 --eval_listfile >>> $eval_output_dir/$Lang.training_files.txt --train_listfile >>> $train_output_dir/$Lang.training_files.txt >> >> #### Building final trained file ./trained_plus_chars/eng_NEW.traineddata >>> d#### >> >> finetune.sh: line 130: 11320 Segmentation fault lstmtraining >>> --stop_training --continue_from $train_output_dir/pluschars_checkpoint >>> --traineddata $train_output_dir/$Lang/$Lang.traineddata --model_output >>> $final_trained_data_file >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/7c46c196-e08d-4541-9f3b-b8a768792c9a% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/7c46c196-e08d-4541-9f3b-b8a768792c9a%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVe%3Drk%2Bq-%3DX4nsf%2B--E6nKKs404nT1Geqn-brL4b8zU5w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.