it seems the problem was copying *langdata *from windows to linux, I have redownload them on linux and it worked, will retry again.
Thanks alot shree for your support On Tue, Jun 18, 2019 at 5:38 PM Shree Devi Kumar <[email protected]> wrote: > Have you modified any word lists, training_text etc? > > What is your tesseract version? > > Which o/s? > > On Tue, 18 Jun 2019, 20:51 fady taher, <[email protected]> wrote: > >> the output of >> >> *src/training/tesstrain.sh --fontlist "Times New Roman" --lang eng >> --linedata_only --noextract_font_properties --langdata_dir >> /home/sw/repo/langdata --tessdata_dir /home/sw/repo/tessdata --output_dir >> ~/tesstutorial/trainplusminus* >> >> is >> >> .... >> .... >> >> >> >> >> >> >> >> >> >> >> *[Tue Jun 18 17:19:46 EET 2019] /usr/local/bin/combine_lang_model >> --input_unicharset /tmp/eng-2019-06-18.baG/eng.unicharset --script_dir >> /home/sw/repo/langdata --words /home/sw/repo/langdata/eng/eng.wordlist >> --numbers /home/sw/repo/langdata/eng/eng.numbers --puncs >> /home/sw/repo/langdata/eng/eng.punc --output_dir >> /home/sw/tesstutorial/trainplusminus --lang engLoaded unicharset of size >> 111 from file /tmp/eng-2019-06-18.baG/eng.unicharsetSetting unichar >> propertiesOther case É of é is not in unicharsetSetting script >> propertiesWarning: properties incomplete for index 95 = ~Config file is >> optional, continuing...Failed to read data from: >> /home/sw/repo/langdata/eng/eng.configNull char=2Reducing Trie to >> SquishedDawgError during conversion of wordlists to DAWGs!!* >> >> On Tue, Jun 18, 2019 at 5:18 PM Shree Devi Kumar <[email protected]> >> wrote: >> >>> That means >>> >>> src/training/tesstrain.sh --fontlist "Times New Roman" --lang eng >>> --linedata_only --noextract_font_properties --langdata_dir >>> /home/sw/repo/langdata --tessdata_dir /home/sw/repo/tessdata --output_dir >>> ~/tesstutorial/trainplusminus >>> >>> did not complete correctly. >>> >>> On Tue, Jun 18, 2019 at 8:46 PM fady taher <[email protected]> >>> wrote: >>> >>>> Nop, this file doesn't exist yet >>>> only contains >>>> >>>> *eng.charset_size=110.txt* >>>> *eng.unicharset* >>>> >>>> >>>> On Tue, Jun 18, 2019 at 4:46 PM Shree Devi Kumar <[email protected]> >>>> wrote: >>>> >>>>> Check ~/tesstutorial/trainplusminus >>>>> Did your earlier training complete correctly? Does >>>>> ~/tesstutorial/trainplusminus/eng/eng.traineddata exist? >>>>> >>>>> On Tue, Jun 18, 2019 at 8:11 PM fady taher <[email protected]> >>>>> wrote: >>>>> >>>>>> Am trying to fine tune tesseract >>>>>> >>>>>> but I keep getting the error >>>>>> *mgr_.Init(traineddata_path.c_str()):Error:Assert >>>>>> failed:in file ../../src/lstm/lstmtrainer.h, line 110 *on the >>>>>> training statement. >>>>>> >>>>>> My script looks as follows >>>>>> >>>>>> cd /home/sw/repo/tesseract-ocr >>>>>> >>>>>> mkdir -p ~/tesstutorial/ >>>>>> mkdir -p ~/tesstutorial/trainplusminus >>>>>> mkdir -p ~/tesstutorial/evalplusminus >>>>>> >>>>>> >>>>>> src/training/tesstrain.sh --fontlist "Times New Roman" --lang eng >>>>>> --linedata_only --noextract_font_properties --langdata_dir >>>>>> /home/sw/repo/langdata --tessdata_dir /home/sw/repo/tessdata >>>>>> --output_dir >>>>>> ~/tesstutorial/trainplusminus >>>>>> >>>>>> src/training/tesstrain.sh --fontlist "Times New Roman" --lang eng >>>>>> --linedata_only --noextract_font_properties --langdata_dir >>>>>> /home/sw/repo/langdata/eng --tessdata_dir /home/sw/repo/tessdata >>>>>> --output_dir ~/tesstutorial/evalplusminus >>>>>> >>>>>> >>>>>> *#eng.lstm file gets extracted correctly* >>>>>> src/training/combine_tessdata -e >>>>>> /home/sw/repo/tessdata/eng.traineddata >>>>>> ~/tesstutorial/trainplusminus/eng.lstm >>>>>> >>>>>> *#this command fails and throws the error* >>>>>> src/training/lstmtraining --model_output >>>>>> ~/tesstutorial/trainplusminus/plusminus \ >>>>>> --continue_from ~/tesstutorial/trainplusminus/eng.lstm \ >>>>>> --traineddata ~/tesstutorial/trainplusminus/eng/eng.traineddata \ >>>>>> --old_traineddata /home/sw/repo/tessdata/eng.traineddata \ >>>>>> --train_listfile >>>>>> ~/tesstutorial/trainplusminus/eng.training_files.txt \ >>>>>> --max_iterations 400 >>>>>> >>>>>> >>>>>> src/training/lstmtraining --stop_training \ >>>>>> --continue_from ~/tesstutorial/trainplusminus/plusminus_checkpoint \ >>>>>> --traineddata ~/tesstutorial/trainplusminus/eng/eng.traineddata \ >>>>>> --model_output ~/tesstutorial/eng_final.traineddata >>>>>> >>>>>> cp ~/tesstutorial/eng_final.traineddata >>>>>> /usr/share/tesseract/4/tessdata/eng.traineddata >>>>>> >>>>>> >>>>>> I have download the eng.traineddata from "Best" repo though, anyone >>>>>> can help ? >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/00310d99-1fc9-402f-b0fa-d048486d77b2%40googlegroups.com >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/00310d99-1fc9-402f-b0fa-d048486d77b2%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ____________________________________________________________ >>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUyFr_891kXw-cLkAU13JoTSj6temm92hEWfP%3DBtZmGHA%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUyFr_891kXw-cLkAU13JoTSj6temm92hEWfP%3DBtZmGHA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/CADhGFTw_1TR96f%3DUTC6k5Pm4GssLvd2NXZ0s9oyMknUBFrtLHQ%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/CADhGFTw_1TR96f%3DUTC6k5Pm4GssLvd2NXZ0s9oyMknUBFrtLHQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> -- >>> >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWQ_po%3DauX3tYaJf9kB_-06inWFMS%2BDKx_RWYMTWZvrmw%40mail.gmail.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWQ_po%3DauX3tYaJf9kB_-06inWFMS%2BDKx_RWYMTWZvrmw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CADhGFTymmYjFaJBBdpxkt2gVkSP4dFLYri-BD3r2bjM5ZCOgPg%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CADhGFTymmYjFaJBBdpxkt2gVkSP4dFLYri-BD3r2bjM5ZCOgPg%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWoQLuB724r7hAzDDkaiG-%3DbabkHyF1CGKZo7QyR%2BE1Yw%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWoQLuB724r7hAzDDkaiG-%3DbabkHyF1CGKZo7QyR%2BE1Yw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CADhGFTyacsD_HPfWxqhsRCTo_GYy06_JVWh%2BpwuPL-fi7%2B_Qww%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

