Thanks for your reply.

Do you know where can I find the new langdata files?

在 2017年8月22日星期二 UTC+8下午3:22:36,shree写道:
>
> The langdata files have not been updated for 4.00alpha
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, Aug 22, 2017 at 12:17 PM, <[email protected] <javascript:>> 
> wrote:
>
>> Hello,
>>
>> I'm trying to re-train the chi_sim.traineddata model from scratch for 
>> studying.
>>
>> I use the source data of chi_sim.training_text in the link directory 
>> https://github.com/tesseract-ocr/langdata/tree/master/chi_sim to train 
>> the model with the command:
>>
>> training/lstmtraining --debug_interval 100 \
>> --traineddata ~/tesstutorial/trainspecial/chi_sim/chi_sim.traineddata \
>> --net_spec '[1,48,0,1 Ct3,3,16 Mp3,3 Lfys64 Lfx96 Lrx96 Lfx512 O1c1]' \
>> --model_output ~/tesstutorial/specialoutput/base --learning_rate 20e-4 \
>> --train_listfile ~/tesstutorial/trainspecial/chi_sim.training_files.txt \
>> --eval_listfile ~/tesstutorial/evalspecial/chi_sim.training_files.txt \
>> --max_iterations 3600 &>~/tesstutorial/specialoutput/basetrain.log
>>
>>
>>
>> The net_spec is same as the official model package (chi_sim.traineddata 
>> from the tessdata github).
>>
>>
>>
>> After converting the training model with the lstmtraining 
>> --stop_training, a new chi_sim.traineddata model gererated, which is named 
>> chi_sim_new.traineddata. 
>> And I name the official chi_sim.traineddata as chi_sim.traineddata for 
>> distinguishing.
>>
>>
>> Then I pull out all the characters in the two traineddata model.
>>
>> There are 4384 characters in the chi_sim.traineddata, but 2538 characters 
>> in the chi_sim_new.traineddata which is generated by me.
>>
>> Why are there different characters in the two models? Does the source 
>> data in the chi_sim.training_text haven't updated in time?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/1111e3f0-588b-456f-90bf-a878f20b1f26%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/1111e3f0-588b-456f-90bf-a878f20b1f26%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b96558c2-1555-41c8-bcb0-0282efeb3556%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to