The files will be at Google. You have to wait till Ray Smith updates the repository.
ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Aug 22, 2017 at 12:58 PM, <robertyoung0...@gmail.com> wrote: > Thanks for your reply. > > Do you know where can I find the new langdata files? > > 在 2017年8月22日星期二 UTC+8下午3:22:36,shree写道: >> >> The langdata files have not been updated for 4.00alpha >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Tue, Aug 22, 2017 at 12:17 PM, <roberty...@gmail.com> wrote: >> >>> Hello, >>> >>> I'm trying to re-train the chi_sim.traineddata model from scratch for >>> studying. >>> >>> I use the source data of chi_sim.training_text in the link directory >>> https://github.com/tesseract-ocr/langdata/tree/master/chi_sim to train >>> the model with the command: >>> >>> training/lstmtraining --debug_interval 100 \ >>> --traineddata ~/tesstutorial/trainspecial/chi_sim/chi_sim.traineddata \ >>> --net_spec '[1,48,0,1 Ct3,3,16 Mp3,3 Lfys64 Lfx96 Lrx96 Lfx512 O1c1]' \ >>> --model_output ~/tesstutorial/specialoutput/base --learning_rate 20e-4 \ >>> --train_listfile ~/tesstutorial/trainspecial/chi_sim.training_files.txt \ >>> --eval_listfile ~/tesstutorial/evalspecial/chi_sim.training_files.txt \ >>> --max_iterations 3600 &>~/tesstutorial/specialoutput/basetrain.log >>> >>> >>> >>> The net_spec is same as the official model package (chi_sim.traineddata >>> from the tessdata github). >>> >>> >>> >>> After converting the training model with the lstmtraining >>> --stop_training, a new chi_sim.traineddata model gererated, which is named >>> chi_sim_new.traineddata. >>> And I name the official chi_sim.traineddata as chi_sim.traineddata for >>> distinguishing. >>> >>> >>> Then I pull out all the characters in the two traineddata model. >>> >>> There are 4384 characters in the chi_sim.traineddata, but 2538 >>> characters in the chi_sim_new.traineddata which is generated by me. >>> >>> Why are there different characters in the two models? Does the source >>> data in the chi_sim.training_text haven't updated in time? >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> To post to this group, send email to tesser...@googlegroups.com. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/1111e3f0-588b-456f-90bf-a878f20b1f26%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/1111e3f0-588b-456f-90bf-a878f20b1f26%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/b96558c2-1555-41c8-bcb0-0282efeb3556% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/b96558c2-1555-41c8-bcb0-0282efeb3556%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXhBRwzXCpYNUiSkUQ2iZinhL8EfVU5hAVqEBY3UrkTAQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.