I want train on 4.0 version

I use 

unicharset_extractor 0_gray.box

to create the unicharset file.

and use
combine_lang_model \
--input_unicharset unicharset \
--script_dir /Users/th/source/langdata \
--output_dir . \
--lang chi_sim
 
to create the chi_sim.traineddata file

and train use this command
lstmtraining \
  --traineddata chi_sim/chi_sim.traineddata \
  --net_spec "[1,40,0,1 Ct5,5,64 Mp3,3 Lfys128 Lbx256 Lbx256 
O1c$num_classes]" \
  --model_output train_out \
  --train_listfile list.train \
  --eval_listfile list.eval

But ouput the error:
Can't encode transcription: '你 好' in language '' Encoding of string failed!

My unicharset 
is: https://gist.github.com/huhuang03/8cd9c739134892cd40495dded3eec81d
My box file 
is: https://gist.github.com/huhuang03/1dc090aecfba6efa4f93abca9ed8e30d

My doubts is: why it look in language  ' '. I think it should look in 
language chi_sim.

Where is my wrong! Thank you!

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ab886999-ac26-41d0-a417-ded7666a1f6e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to