Thanks for your reply. Do you know where can I find the new langdata files?
在 2017年8月22日星期二 UTC+8下午3:22:36,shree写道: > > The langdata files have not been updated for 4.00alpha > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Tue, Aug 22, 2017 at 12:17 PM, <[email protected] <javascript:>> > wrote: > >> Hello, >> >> I'm trying to re-train the chi_sim.traineddata model from scratch for >> studying. >> >> I use the source data of chi_sim.training_text in the link directory >> https://github.com/tesseract-ocr/langdata/tree/master/chi_sim to train >> the model with the command: >> >> training/lstmtraining --debug_interval 100 \ >> --traineddata ~/tesstutorial/trainspecial/chi_sim/chi_sim.traineddata \ >> --net_spec '[1,48,0,1 Ct3,3,16 Mp3,3 Lfys64 Lfx96 Lrx96 Lfx512 O1c1]' \ >> --model_output ~/tesstutorial/specialoutput/base --learning_rate 20e-4 \ >> --train_listfile ~/tesstutorial/trainspecial/chi_sim.training_files.txt \ >> --eval_listfile ~/tesstutorial/evalspecial/chi_sim.training_files.txt \ >> --max_iterations 3600 &>~/tesstutorial/specialoutput/basetrain.log >> >> >> >> The net_spec is same as the official model package (chi_sim.traineddata >> from the tessdata github). >> >> >> >> After converting the training model with the lstmtraining >> --stop_training, a new chi_sim.traineddata model gererated, which is named >> chi_sim_new.traineddata. >> And I name the official chi_sim.traineddata as chi_sim.traineddata for >> distinguishing. >> >> >> Then I pull out all the characters in the two traineddata model. >> >> There are 4384 characters in the chi_sim.traineddata, but 2538 characters >> in the chi_sim_new.traineddata which is generated by me. >> >> Why are there different characters in the two models? Does the source >> data in the chi_sim.training_text haven't updated in time? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/1111e3f0-588b-456f-90bf-a878f20b1f26%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/1111e3f0-588b-456f-90bf-a878f20b1f26%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b96558c2-1555-41c8-bcb0-0282efeb3556%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

