The number of iterations for training from scratch need to be much larger hundreds of thousands.
5000 is used in tutorial to give an idea of training process. You need to train till error rates is close to 0.01 On Fri, Aug 7, 2020, 14:24 [email protected] <[email protected]> wrote: > Could you also please advise for training experience > > I am training Vietnamese for only Time New Romans at this time. > > The best traineddata is good, but it is big (for all fonts) and take quite > a long time to process > > I plan to train from scratch, > *...* > > > *--eval_listfile ~/tesstutorial/engeval/eng.training_files.txt > \--max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log* > > After 5000 iterations, *Error rate = 76.676 *it is so high > > What should I do next? > It is any improvements if I rerun the above training for second/third time > (with same data in *--train_listfile ~*). As I thought, each time the > traineddata is updated. > Is it a way to exact traineddata from best_traineddata for some selected > fonts? > > Thanks, > > TuPM > > On Friday, August 7, 2020 at 9:30:33 AM UTC+7 [email protected] wrote: > >> Many thanks Shree, >> >> As you suggest, I remove the the path, now it works now >> >> by the way, my tesseract and lstm version: >> >> tesseract 5.0.0-alpha-773-gd33ed l >> eptonica-1.78.0 >> >> ~ % lstmtraining -v >> 5.0.0-alpha-773-gd33ed >> On Friday, August 7, 2020 at 8:43:02 AM UTC+7 shree wrote: >> >>> If you have tesseract and all training tools installed, you should be >>> able to use >>> tesseract >>> lstmtraining >>> etc without giving the path. >>> >>> What's the output of >>> >>> which tesseract >>> tesseract -v >>> which lstmtraining >>> lstmtraining -v >>> >>> >>> >>> On Fri, Aug 7, 2020, 01:13 [email protected] <[email protected]> wrote: >>> >>>> Sorry that I forgot to note: >>>> >>>> I use Mac OS 10.15.6 Catalina >>>> >>>> The tessseract version: tesseract 5.0.0-alpha-773-gd33ed >>>> >>>> Also, tesseract is installed via MacPorts, since installation via brew >>>> has a lot of errors. >>>> >>>> Thanks, >>>> On Friday, August 7, 2020 at 2:40:06 AM UTC+7 [email protected] wrote: >>>> >>>>> Dear friends, >>>>> >>>>> I have tried to run tesseract followed the guide in: >>>>> https://github.com/tesseract-ocr/tesseract/issues/1453 >>>>> >>>>> Until the step 10: >>>>> >>>>> SCROLLVIEW_PATH=~/tesseract/java \ >>>>> ~/tesseract/src/training/lstmtraining \ >>>>> --debug_interval 100 \ >>>>> --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \ >>>>> --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 >>>>> O1c111]' \ >>>>> --model_output ~/tesstutorial/engoutput/base \ >>>>> --learning_rate 20e-4 \ >>>>> --debug_interval -1 \ >>>>> --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \ >>>>> --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \ >>>>> --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log >>>>> >>>>> then no thing happen, in the basetrain.log: >>>>> *zsh: no such file or directory: >>>>> /Users/minhtupham/tesseract/src/training/lstmtraining* >>>>> >>>>> is there missing lstmtraining file? >>>>> I check in the training folder, there is a file name "lstmtraining.cpp" >>>>> >>>>> Please help me what I have to do? >>>>> >>>>> Many thanks, >>>>> >>>>> TuPM >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/b45b1f8d-4e84-482b-b0f1-03670a14055en%40googlegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/b45b1f8d-4e84-482b-b0f1-03670a14055en%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/5c4f1657-252f-4f5e-be85-b55b78c21bf3n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/5c4f1657-252f-4f5e-be85-b55b78c21bf3n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWNuaa-YCz%3DJWBTK-%2BLQfsYTKRUiUMqA1Tg%3DfV7MBoObg%40mail.gmail.com.

