thank for your attention i remove all and install again last version tesseract and leptonica and use this syntax training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --training_text training/langdata/eng/eng.training_text --linedata_only \ --noextract_font_properties --langdata_dir training/langdata \ --tessdata_dir ./tessdata \ --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian
but got a new error. all of things is ok but at the end took this: Setting unichar properties Other case É of é is not in unicharset Setting script properties Failed to read data from: training/langdata/eng/eng.config Null char=2 Invalid format in radical table at line 4: 3400 1.4 Creation of encoded unicharset failed!! Error writing recoder!! Reducing Trie to SquishedDawg Reducing Trie to SquishedDawg Reducing Trie to SquishedDawg Moving /tmp/tmp.GW5DOJr0rG/eng/eng.Times_New_Roman.exp0.lstmf to /home/zohreh/tesstutorial/engtrian Completed training for language 'eng' and i dont have eng.config my langdata . i clone langdata from git's tesseract On Saturday, August 5, 2017 at 5:50:59 PM UTC+4:30, shree wrote: > > tesseract -v > tesseract 4.00.00dev-594-g044e06e-2085 > leptonica-1.74.4 > libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib > 1.2.8 > > Found AVX > Found SSE > > > The above version is working ok on linux > > nice lstmtraining \ > --old_traineddata ../tessdata/best/san.traineddata \ > --continue_from ../tessdata/best/san.lstm \ > --traineddata ../tesstutorial/vedic/san/san.traineddata \ > --train_listfile ../tesstutorial/vedic/san.training_files.txt \ > --eval_listfile ../tesstutorial/vedic/san.eval_files.txt \ > --model_output ../tesstutorial/vedic/santune \ > --max_iterations 200 \ > --debug_interval 0 > > Loaded file ../tessdata/best/san.lstm, unpacking... > Warning: LSTMTrainer deserialized an LSTMRecognizer! > Code range changed from 145 to 2308!! > Num (Extended) outputs,weights in Series: > 1,36,0,1:1, 0 > Num (Extended) outputs,weights in Series: > C3,3:9, 0 > Ft16:16, 160 > Total weights = 160 > [C3,3Ft16]:16, 160 > Mp3,3:16, 0 > Lfys48:48, 12480 > Lfx96:96, 55680 > Lrx96:96, 74112 > Lfx192:192, 221952 > Fc2308:2308, 445444 > Total weights = 809828 > Previous null char=2 mapped to 2 > Continuing from ../tessdata/best/san.lstm > Loaded 138/138 pages (1-138) of document > ../tesstutorial/vedic/san.AA_NAGARI_SHREE_L3.exp0.lstmf > Loaded 138/138 pages (1-138) of document > ../tesstutorial/vedic/san.AA_NAGARI_SHREE_L3.exp-1.lstmf > Loaded 138/138 pages (1-138) of document > ../tesstutorial/vedic/san.Adobe_Devanagari.exp-2.lstmf > Loaded 138/138 pages (1-138) of document > ../tesstutorial/vedic/san.Adobe_Devanagari.exp1.lstmf > > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Sat, Aug 5, 2017 at 6:43 PM, ShreeDevi Kumar <shree...@gmail.com > <javascript:>> wrote: > >> did you build the training tools again? >> >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Sat, Aug 5, 2017 at 6:37 PM, Ava Nimaee <beigy....@gmail.com >> <javascript:>> wrote: >> >>> yes, you said me and i clone last tesseract-master and insatll it and >>> leptoica again and make tiff and box file and unicharest and then use this >>> syntax: >>> training/tesstrain.sh \ >>> --fonts_dir /usr/share/fonts \ >>> --lang eng \ >>> --training_text langdata/eng/eng.training_text \ >>> --linedata_only \ >>> --noextract_font_properties --langdata_dir langdata \ >>> --tessdata_dir ./tessdata \ >>> --fontlist "Times New Roman," \ >>> --output_dir tesstutorial/engtrian >>> ------------------------------------------------------------ >>> training/tesstrain.sh \ >>> --fonts_dir /usr/share/fonts \ >>> --lang eng \ >>> --training_text langdata/eng/eng.training_text \ >>> --linedata_only \ >>> --noextract_font_properties --langdata_dir langdata \ >>> --tessdata_dir ./tessdata \ >>> --output_dir tesstutorial/engeval >>> and finally i use the last code that i said took error. >>> and for last syntax i put langdata/eng on folder of engtrian >>> >>> >>> On Saturday, August 5, 2017 at 5:28:48 PM UTC+4:30, shree wrote: >>>> >>>> Are you using the latest source of programs from github for building >>>> tesseract? >>>> >>>> ShreeDevi >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>>> On Sat, Aug 5, 2017 at 6:21 PM, Ava Nimaee <beigy....@gmail.com> wrote: >>>> >>>>> Hi >>>>> i used this syntax: >>>>> >>>>> training/lstmtraining --debug_interval 100 \ >>>>> --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \ >>>>> --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' >>>>> \ >>>>> --model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \ >>>>> --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \ >>>>> --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \ >>>>> --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log >>>>> >>>>> and put eng.traineddata on right path but has an error: >>>>> >>>>> ERROR: Non-existent flag --traineddata >>>>> >>>>> can you help me? >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/30f1bf28-ea15-4999-b9ca-bccfed2be66f%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/30f1bf28-ea15-4999-b9ca-bccfed2be66f%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com <javascript:>. >>> To post to this group, send email to tesser...@googlegroups.com >>> <javascript:>. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/a9e00cdf-64d2-4cfe-9ff8-de931c34d798%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/a9e00cdf-64d2-4cfe-9ff8-de931c34d798%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/633854bd-d3f3-4340-943d-9c9b062e2a62%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.