The unicharset is based on the training text you use. Please make sure you have all required characters in the text.
Fine-tune for impact works with the unicharset of the best traineddata file, but then you can't add any characters to it. On Sun, Mar 29, 2020, 11:08 Essam Zaky <[email protected]> wrote: > Hi@shreeshrii > attached is the bash script as described in the following page > > https://github.com/tesseract-ocr/tesseract/issues/2695#issuecomment-539412948 > > when i change the line #51 line > > --traineddata ~/tesstutorial/tesseract/tessdata/best/ara.traineddata \ > > to be > > --traineddata ~/tesstutorial/araeval/ara/ara.traineddata > > now it works fine without error > but i have another question > the number of character set in best train is 85 and in the new generated > character set contain only 74 > how to keep unicharset number as best 85 ? > > > بتاريخ الأحد، 29 مارس، 2020 5:06:16 ص UTC+2، كتب shree: >> >> See >> https://github.com/Shreeshrii/tess4training/blob/master/6-plusminus.sh >> >> lstmtraining --model_output ../tesstutorial/trainplusminus/plusminus \ >> --continue_from ../tesstutorial/trainplusminus/eng.lstm \ >> --traineddata ../tesstutorial/trainplusminus/eng/eng.traineddata \ >> --old_traineddata tessdata/best/eng.traineddata \ >> --train_listfile ../tesstutorial/trainplusminus/eng.training_files.txt \ >> --max_iterations 3600 >> >> ... >> >> >> lstmtraining \ >> --stop_training \ >> --continue_from ../tesstutorial/trainplusminus/plusminus_checkpoint \ >> --traineddata ../tesstutorial/trainplusminus/eng/eng.traineddata \ >> --model_output ../tesstutorial/trainplusminus/eng_plusminus.traineddata >> >> --traineddata needs to be same in both commands. >> >> On Sun, Mar 29, 2020 at 6:45 AM Shree Devi Kumar <[email protected]> >> wrote: >> >>> Please check that you have used the correct path for the traineddata >>> file. >>> >>> Please share the lstmtraining command that you used before this for >>> training. >>> >>> On Sat, Mar 28, 2020, 22:56 Essam Zaky <[email protected]> wrote: >>> >>>> Dear @Shreeshrii >>>> I had followed your bash script to add Andalus font in the Arabic >>>> lanaguage here it the script url >>>> >>>> https://github.com/tesseract-ocr/tesseract/issues/2695#issuecomment-539412948 >>>> >>>> all steps steps works except the last one which generate the >>>> traineddata here it's the error >>>> >>>> osboxes@osboxes:~/tesstutorial/tesseract$ time lstmtraining \ >>>> > --stop_training \ >>>> > --continue_from ~/tesstutorial/ara_from_full/PLUS_checkpoint \ >>>> > --traineddata >>>> ~/tesstutorial/tesseract/tessdata/best/ara.traineddata \ >>>> > --model_output >>>> ~/tesstutorial/ara_from_full/ara.Andalus.PLUS.traineddata >>>> Loaded file /home/osboxes/tesstutorial/ara_from_full/PLUS_checkpoint, >>>> unpacking... >>>> Code range changed from 74 to 85! >>>> Must supply the old traineddata for code conversion! >>>> Failed to read continue from: >>>> /home/osboxes/tesstutorial/ara_from_full/PLUS_checkpoint >>>> >>>> >>>> Best Regards >>>> Essam >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/0c9123f5-8e80-447c-9bf1-2c6ec9831238%40googlegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/0c9123f5-8e80-447c-9bf1-2c6ec9831238%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> >> >> -- >> >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/e1e7e7c6-8b11-4713-a303-837604668c22%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/e1e7e7c6-8b11-4713-a303-837604668c22%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWPzg%3DvB5MsnZq_i-cjUx4S0VmP4kUgyV-Kh25_g%2BFnYg%40mail.gmail.com.

