Hi@shreeshrii attached is the bash script as described in the following page https://github.com/tesseract-ocr/tesseract/issues/2695#issuecomment-539412948
when i change the line #51 line --traineddata ~/tesstutorial/tesseract/tessdata/best/ara.traineddata \ to be --traineddata ~/tesstutorial/araeval/ara/ara.traineddata now it works fine without error but i have another question the number of character set in best train is 85 and in the new generated character set contain only 74 how to keep unicharset number as best 85 ? بتاريخ الأحد، 29 مارس، 2020 5:06:16 ص UTC+2، كتب shree: > > See https://github.com/Shreeshrii/tess4training/blob/master/6-plusminus.sh > > lstmtraining --model_output ../tesstutorial/trainplusminus/plusminus \ > --continue_from ../tesstutorial/trainplusminus/eng.lstm \ > --traineddata ../tesstutorial/trainplusminus/eng/eng.traineddata \ > --old_traineddata tessdata/best/eng.traineddata \ > --train_listfile ../tesstutorial/trainplusminus/eng.training_files.txt \ > --max_iterations 3600 > > ... > > > lstmtraining \ > --stop_training \ > --continue_from ../tesstutorial/trainplusminus/plusminus_checkpoint \ > --traineddata ../tesstutorial/trainplusminus/eng/eng.traineddata \ > --model_output ../tesstutorial/trainplusminus/eng_plusminus.traineddata > > --traineddata needs to be same in both commands. > > On Sun, Mar 29, 2020 at 6:45 AM Shree Devi Kumar <[email protected] > <javascript:>> wrote: > >> Please check that you have used the correct path for the traineddata file. >> >> Please share the lstmtraining command that you used before this for >> training. >> >> On Sat, Mar 28, 2020, 22:56 Essam Zaky <[email protected] <javascript:>> >> wrote: >> >>> Dear @Shreeshrii >>> I had followed your bash script to add Andalus font in the Arabic >>> lanaguage here it the script url >>> >>> https://github.com/tesseract-ocr/tesseract/issues/2695#issuecomment-539412948 >>> >>> all steps steps works except the last one which generate the traineddata >>> here it's the error >>> >>> osboxes@osboxes:~/tesstutorial/tesseract$ time lstmtraining \ >>> > --stop_training \ >>> > --continue_from ~/tesstutorial/ara_from_full/PLUS_checkpoint \ >>> > --traineddata ~/tesstutorial/tesseract/tessdata/best/ara.traineddata >>> \ >>> > --model_output >>> ~/tesstutorial/ara_from_full/ara.Andalus.PLUS.traineddata >>> Loaded file /home/osboxes/tesstutorial/ara_from_full/PLUS_checkpoint, >>> unpacking... >>> Code range changed from 74 to 85! >>> Must supply the old traineddata for code conversion! >>> Failed to read continue from: >>> /home/osboxes/tesstutorial/ara_from_full/PLUS_checkpoint >>> >>> >>> Best Regards >>> Essam >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected] <javascript:>. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/0c9123f5-8e80-447c-9bf1-2c6ec9831238%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/0c9123f5-8e80-447c-9bf1-2c6ec9831238%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e1e7e7c6-8b11-4713-a303-837604668c22%40googlegroups.com.
#!/bin/bash time tesstrain.sh \ --fonts_dir ~/.fonts \ --lang ara --linedata_only \ --noextract_font_properties \ --langdata_dir ~/tesstutorial/langdata \ --tessdata_dir ~/tesstutorial/tesseract/tessdata \ --fontlist "Andalus" \ --training_text ~/tesstutorial/langdata/ara/ara.training_text \ --workspace_dir ~/tesstutorial/tmp/ \ --save_box_tiff \ --output_dir ~/tesstutorial/araeval echo "/n ****** Finetune one of the fully-trained existing models: ***********" mkdir -p ~/tesstutorial/ara_from_full combine_tessdata -e ~/tesstutorial/tesseract/tessdata/best/ara.traineddata \ ~/tesstutorial/ara_from_full/ara.lstm lstmtraining \ --model_output ~/tesstutorial/ara_from_full/PLUS \ --continue_from ~/tesstutorial/ara_from_full/ara.lstm \ --traineddata ~/tesstutorial/araeval/ara/ara.traineddata \ --old_traineddata ~/tesstutorial/tesseract/tessdata/best/ara.traineddata \ --train_listfile ~/tesstutorial/araeval/ara.training_files.txt \ --debug_interval -1 \ --max_iterations 3600 &>~/tesstutorial/ara_from_full/plustrain.log tail -f ~/tesstutorial/ara_from_full/plustrain.log echo -e "\n**************************** ******\n" lstmeval \ --model ~/tesstutorial/tesseract/tessdata/best/ara.traineddata \ --eval_listfile ~/tesstutorial/araeval/ara.training_files.txt echo -e "\n**************************** ******\n" lstmeval \ --model ~/tesstutorial/ara_from_full/PLUS_checkpoint \ --traineddata ~/tesstutorial/araeval/ara/ara.traineddata \ --eval_listfile ~/tesstutorial/araeval/ara.training_files.txt echo -e "\n**************************** ******\n" time lstmtraining \ --stop_training \ --continue_from ~/tesstutorial/ara_from_full/PLUS_checkpoint \ --traineddata ~/tesstutorial/tesseract/tessdata/best/ara.traineddata \ --model_output ~/tesstutorial/ara_from_full/ara.Andalus.PLUS.traineddata

