Thanks @shreeshrii , while prepare the training text what are the recommendations for this step
is there ant tutorial to show me how to prepare the training text. example what is the recommended text size how many character instance repeated in the training set , what about ligatures, how to handle it and how to add it in unicharset .... بتاريخ الأحد، 29 مارس، 2020 7:50:54 ص UTC+2، كتب shree: > > The unicharset is based on the training text you use. Please make sure you > have all required characters in the text. > > Fine-tune for impact works with the unicharset of the best traineddata > file, but then you can't add any characters to it. > > On Sun, Mar 29, 2020, 11:08 Essam Zaky <[email protected] <javascript:>> > wrote: > >> Hi@shreeshrii >> attached is the bash script as described in the following page >> >> https://github.com/tesseract-ocr/tesseract/issues/2695#issuecomment-539412948 >> >> when i change the line #51 line >> >> --traineddata ~/tesstutorial/tesseract/tessdata/best/ara.traineddata \ >> >> to be >> >> --traineddata ~/tesstutorial/araeval/ara/ara.traineddata >> >> now it works fine without error >> but i have another question >> the number of character set in best train is 85 and in the new generated >> character set contain only 74 >> how to keep unicharset number as best 85 ? >> >> >> بتاريخ الأحد، 29 مارس، 2020 5:06:16 ص UTC+2، كتب shree: >>> >>> See >>> https://github.com/Shreeshrii/tess4training/blob/master/6-plusminus.sh >>> >>> lstmtraining --model_output ../tesstutorial/trainplusminus/plusminus \ >>> --continue_from ../tesstutorial/trainplusminus/eng.lstm \ >>> --traineddata ../tesstutorial/trainplusminus/eng/eng.traineddata \ >>> --old_traineddata tessdata/best/eng.traineddata \ >>> --train_listfile ../tesstutorial/trainplusminus/eng.training_files.txt >>> \ >>> --max_iterations 3600 >>> >>> ... >>> >>> >>> lstmtraining \ >>> --stop_training \ >>> --continue_from ../tesstutorial/trainplusminus/plusminus_checkpoint \ >>> --traineddata ../tesstutorial/trainplusminus/eng/eng.traineddata \ >>> --model_output ../tesstutorial/trainplusminus/eng_plusminus.traineddata >>> >>> --traineddata needs to be same in both commands. >>> >>> On Sun, Mar 29, 2020 at 6:45 AM Shree Devi Kumar <[email protected]> >>> wrote: >>> >>>> Please check that you have used the correct path for the traineddata >>>> file. >>>> >>>> Please share the lstmtraining command that you used before this for >>>> training. >>>> >>>> On Sat, Mar 28, 2020, 22:56 Essam Zaky <[email protected]> wrote: >>>> >>>>> Dear @Shreeshrii >>>>> I had followed your bash script to add Andalus font in the Arabic >>>>> lanaguage here it the script url >>>>> >>>>> https://github.com/tesseract-ocr/tesseract/issues/2695#issuecomment-539412948 >>>>> >>>>> all steps steps works except the last one which generate the >>>>> traineddata here it's the error >>>>> >>>>> osboxes@osboxes:~/tesstutorial/tesseract$ time lstmtraining \ >>>>> > --stop_training \ >>>>> > --continue_from ~/tesstutorial/ara_from_full/PLUS_checkpoint \ >>>>> > --traineddata >>>>> ~/tesstutorial/tesseract/tessdata/best/ara.traineddata \ >>>>> > --model_output >>>>> ~/tesstutorial/ara_from_full/ara.Andalus.PLUS.traineddata >>>>> Loaded file /home/osboxes/tesstutorial/ara_from_full/PLUS_checkpoint, >>>>> unpacking... >>>>> Code range changed from 74 to 85! >>>>> Must supply the old traineddata for code conversion! >>>>> Failed to read continue from: >>>>> /home/osboxes/tesstutorial/ara_from_full/PLUS_checkpoint >>>>> >>>>> >>>>> Best Regards >>>>> Essam >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/0c9123f5-8e80-447c-9bf1-2c6ec9831238%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/0c9123f5-8e80-447c-9bf1-2c6ec9831238%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> >>> >>> -- >>> >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/e1e7e7c6-8b11-4713-a303-837604668c22%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/e1e7e7c6-8b11-4713-a303-837604668c22%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0446e92c-6302-4910-a633-2f5e9fa1e043%40googlegroups.com.

