As per comments by Ray, for finetune or for plus minus a few letters. the number of iterations should be limited to 3000 or so.
It probably won't get to .2% accuracy, but you might have better results ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Sep 19, 2017 at 2:00 PM, <[email protected]> wrote: > Hello, > > I am training my own traineddata model for the chi_sim language with the > finetune training. In my trained data, there are some mathematical symbols, > such as "∞", "β", "△" and so on, which cannot be recognized in the official > chi_sim.traineddata model. > > So we change the content of the chi_sim.training_text file, and fill the > file with our training data. > > > Then executing the training command: > training/lstmtraining --model_output ~/tesstutorial/trainspecial/special \ > --continue_from ~/tesstutorial/trainspecial/chi_sim.lstm \ > --traineddata ~/tesstutorial/trainspecial/chi_sim/chi_sim.traineddata \ > --old_traineddata tessdata/best/chi_sim.traineddata \ > --train_listfile ~/tesstutorial/trainspecial/chi_sim.training_files.txt > \ > --max_iterations 400000 > > As the command, when we iterate 400000 times, the char error is about 0.2% > and the word error is about 4.2%. > The error rate has almost started to oscillate and it can't go down. So we > stopped training and exported the traineddata model. > > After testing the exported traineddata model, the accuracy is not > satisfactory enough, which is lower than the model provided by the official > website (tesseract github website). > > We hope that the training model recognition accuracy will be consistent > with the official website. Then how can we continue to further improve the > accuracy of the model? > > Does anyone know the details of the official website training language > model, such as the num of iteration, the lowest char error and word error, > the value of the learning_rate, and so on? > > If you know these information, please give some tips. > > > Thank you. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/a9a25aeb-2182-41d5-9a69-aef34a92eb27% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/a9a25aeb-2182-41d5-9a69-aef34a92eb27%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUin6cHZK3QZ%3DPXt4EB6cpP%2B99GqxTWtAB_-7wx_JOOOw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

