Ray is the only one who would know those details. Please see https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-322020794 for his comment regarding finetuning.
ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Sep 19, 2017 at 2:28 PM, <[email protected]> wrote: > Does the finetune update all the parameters in all of the layers? > > We need to add lots of mathematical symbols and some other special > symbols. Maybe we should scratch training? > > What is the char error and iteration times for the scratch training, then > we train the chi_sim(Simplified Chinese)? > > > > 在 2017年9月19日星期二 UTC+8下午4:49:30,shree写道: >> >> As per comments by Ray, for finetune or for plus minus a few letters. >> the number of iterations should be limited to 3000 or so. >> >> It probably won't get to .2% accuracy, but you might have better results >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Tue, Sep 19, 2017 at 2:00 PM, <[email protected]> wrote: >> >>> Hello, >>> >>> I am training my own traineddata model for the chi_sim language with the >>> finetune training. In my trained data, there are some mathematical symbols, >>> such as "∞", "β", "△" and so on, which cannot be recognized in the official >>> chi_sim.traineddata model. >>> >>> So we change the content of the chi_sim.training_text file, and fill the >>> file with our training data. >>> >>> >>> Then executing the training command: >>> training/lstmtraining --model_output ~/tesstutorial/trainspecial/special >>> \ >>> --continue_from ~/tesstutorial/trainspecial/chi_sim.lstm \ >>> --traineddata ~/tesstutorial/trainspecial/chi_sim/chi_sim.traineddata >>> \ >>> --old_traineddata tessdata/best/chi_sim.traineddata \ >>> --train_listfile ~/tesstutorial/trainspecial/chi_sim.training_files.txt >>> \ >>> --max_iterations 400000 >>> >>> As the command, when we iterate 400000 times, the char error is about >>> 0.2% and the word error is about 4.2%. >>> The error rate has almost started to oscillate and it can't go down. So >>> we stopped training and exported the traineddata model. >>> >>> After testing the exported traineddata model, the accuracy is not >>> satisfactory enough, which is lower than the model provided by the official >>> website (tesseract github website). >>> >>> We hope that the training model recognition accuracy will be consistent >>> with the official website. Then how can we continue to further improve the >>> accuracy of the model? >>> >>> Does anyone know the details of the official website training language >>> model, such as the num of iteration, the lowest char error and word error, >>> the value of the learning_rate, and so on? >>> >>> If you know these information, please give some tips. >>> >>> >>> Thank you. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/a9a25aeb-2182-41d5-9a69-aef34a92eb27%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/a9a25aeb-2182-41d5-9a69-aef34a92eb27%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/54f6a114-54c3-462b-a6f0-11d6ca81f6c4% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/54f6a114-54c3-462b-a6f0-11d6ca81f6c4%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXHXAdwbyN%3Dhz1D8OOcNcbwvAAeFY3ovGF7-A8zRYtfBg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

