I have not tried with english. Please create an eng.config file in your langdata directory and then try
You can put the following 2 lines in it. # Use LSTM tessedit_ocr_engine_mode 1 ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Aug 5, 2017 at 10:56 PM, Ava Nimaee <[email protected]> wrote: > thank for your attention > i remove all and install again last version tesseract and leptonica and > use this syntax > training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng > --training_text training/langdata/eng/eng.training_text > --linedata_only \ > --noextract_font_properties --langdata_dir training/langdata \ > --tessdata_dir ./tessdata \ > --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian > > but got a new error. all of things is ok but at the end took this: > > Setting unichar properties > Other case É of é is not in unicharset > Setting script properties > Failed to read data from: training/langdata/eng/eng.config > Null char=2 > Invalid format in radical table at line 4: 3400 1.4 > Creation of encoded unicharset failed!! > Error writing recoder!! > Reducing Trie to SquishedDawg > Reducing Trie to SquishedDawg > Reducing Trie to SquishedDawg > Moving /tmp/tmp.GW5DOJr0rG/eng/eng.Times_New_Roman.exp0.lstmf to > /home/zohreh/tesstutorial/engtrian > > Completed training for language 'eng' > and i dont have eng.config my langdata . i clone langdata from git's > tesseract > > > On Saturday, August 5, 2017 at 5:50:59 PM UTC+4:30, shree wrote: >> >> tesseract -v >> tesseract 4.00.00dev-594-g044e06e-2085 >> leptonica-1.74.4 >> libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib >> 1.2.8 >> >> Found AVX >> Found SSE >> >> >> The above version is working ok on linux >> >> nice lstmtraining \ >> --old_traineddata ../tessdata/best/san.traineddata \ >> --continue_from ../tessdata/best/san.lstm \ >> --traineddata ../tesstutorial/vedic/san/san.traineddata \ >> --train_listfile ../tesstutorial/vedic/san.training_files.txt \ >> --eval_listfile ../tesstutorial/vedic/san.eval_files.txt \ >> --model_output ../tesstutorial/vedic/santune \ >> --max_iterations 200 \ >> --debug_interval 0 >> >> Loaded file ../tessdata/best/san.lstm, unpacking... >> Warning: LSTMTrainer deserialized an LSTMRecognizer! >> Code range changed from 145 to 2308!! >> Num (Extended) outputs,weights in Series: >> 1,36,0,1:1, 0 >> Num (Extended) outputs,weights in Series: >> C3,3:9, 0 >> Ft16:16, 160 >> Total weights = 160 >> [C3,3Ft16]:16, 160 >> Mp3,3:16, 0 >> Lfys48:48, 12480 >> Lfx96:96, 55680 >> Lrx96:96, 74112 >> Lfx192:192, 221952 >> Fc2308:2308, 445444 >> Total weights = 809828 >> Previous null char=2 mapped to 2 >> Continuing from ../tessdata/best/san.lstm >> Loaded 138/138 pages (1-138) of document ../tesstutorial/vedic/san.AA_N >> AGARI_SHREE_L3.exp0.lstmf >> Loaded 138/138 pages (1-138) of document ../tesstutorial/vedic/san.AA_N >> AGARI_SHREE_L3.exp-1.lstmf >> Loaded 138/138 pages (1-138) of document ../tesstutorial/vedic/san.Adob >> e_Devanagari.exp-2.lstmf >> Loaded 138/138 pages (1-138) of document ../tesstutorial/vedic/san.Adob >> e_Devanagari.exp1.lstmf >> >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Sat, Aug 5, 2017 at 6:43 PM, ShreeDevi Kumar <[email protected]> >> wrote: >> >>> did you build the training tools again? >>> >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Sat, Aug 5, 2017 at 6:37 PM, Ava Nimaee <[email protected]> wrote: >>> >>>> yes, you said me and i clone last tesseract-master and insatll it and >>>> leptoica again and make tiff and box file and unicharest and then use this >>>> syntax: >>>> training/tesstrain.sh \ >>>> --fonts_dir /usr/share/fonts \ >>>> --lang eng \ >>>> --training_text langdata/eng/eng.training_text \ >>>> --linedata_only \ >>>> --noextract_font_properties --langdata_dir langdata \ >>>> --tessdata_dir ./tessdata \ >>>> --fontlist "Times New Roman," \ >>>> --output_dir tesstutorial/engtrian >>>> ------------------------------------------------------------ >>>> training/tesstrain.sh \ >>>> --fonts_dir /usr/share/fonts \ >>>> --lang eng \ >>>> --training_text langdata/eng/eng.training_text \ >>>> --linedata_only \ >>>> --noextract_font_properties --langdata_dir langdata \ >>>> --tessdata_dir ./tessdata \ >>>> --output_dir tesstutorial/engeval >>>> and finally i use the last code that i said took error. >>>> and for last syntax i put langdata/eng on folder of engtrian >>>> >>>> >>>> On Saturday, August 5, 2017 at 5:28:48 PM UTC+4:30, shree wrote: >>>>> >>>>> Are you using the latest source of programs from github for building >>>>> tesseract? >>>>> >>>>> ShreeDevi >>>>> ____________________________________________________________ >>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>> >>>>> On Sat, Aug 5, 2017 at 6:21 PM, Ava Nimaee <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi >>>>>> i used this syntax: >>>>>> >>>>>> training/lstmtraining --debug_interval 100 \ >>>>>> --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \ >>>>>> --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 >>>>>> O1c111]' \ >>>>>> --model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \ >>>>>> --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \ >>>>>> --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \ >>>>>> --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log >>>>>> >>>>>> and put eng.traineddata on right path but has an error: >>>>>> >>>>>> ERROR: Non-existent flag --traineddata >>>>>> >>>>>> can you help me? >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/30f1bf28-ea1 >>>>>> 5-4999-b9ca-bccfed2be66f%40googlegroups.com >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/30f1bf28-ea15-4999-b9ca-bccfed2be66f%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit https://groups.google.com/d/ms >>>> gid/tesseract-ocr/a9e00cdf-64d2-4cfe-9ff8-de931c34d798%40goo >>>> glegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/a9e00cdf-64d2-4cfe-9ff8-de931c34d798%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/633854bd-d3f3-4340-943d-9c9b062e2a62% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/633854bd-d3f3-4340-943d-9c9b062e2a62%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU6fnRrKLEZLCnPrJ2oom%3DjxTju2d_7auo%3DA5Zokswpww%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

