I want train on 4.0 version I use
unicharset_extractor 0_gray.box to create the unicharset file. and use combine_lang_model \ --input_unicharset unicharset \ --script_dir /Users/th/source/langdata \ --output_dir . \ --lang chi_sim to create the chi_sim.traineddata file and train use this command lstmtraining \ --traineddata chi_sim/chi_sim.traineddata \ --net_spec "[1,40,0,1 Ct5,5,64 Mp3,3 Lfys128 Lbx256 Lbx256 O1c$num_classes]" \ --model_output train_out \ --train_listfile list.train \ --eval_listfile list.eval But ouput the error: Can't encode transcription: '你 好' in language '' Encoding of string failed! My unicharset is: https://gist.github.com/huhuang03/8cd9c739134892cd40495dded3eec81d My box file is: https://gist.github.com/huhuang03/1dc090aecfba6efa4f93abca9ed8e30d My doubts is: why it look in language ' '. I think it should look in language chi_sim. Where is my wrong! Thank you! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ab886999-ac26-41d0-a417-ded7666a1f6e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

