OK. Thanks for your reply.

在 2017年9月19日星期二 UTC+8下午5:06:57,shree写道:
>
> Ray is the only one who would know those details.
>
> Please see 
> https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-322020794 
> for his comment regarding finetuning.
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, Sep 19, 2017 at 2:28 PM, <[email protected] <javascript:>> 
> wrote:
>
>> Does the finetune update all the parameters in all of the layers?
>>
>> We need to add lots of mathematical symbols and some other special 
>> symbols. Maybe we should scratch training?
>>
>> What is the char error and iteration times for the scratch training, then 
>> we train the chi_sim(Simplified Chinese)?
>>
>>
>>
>> 在 2017年9月19日星期二 UTC+8下午4:49:30,shree写道:
>>>
>>> As per comments by Ray, for finetune or for plus minus a few letters.
>>> the number of iterations should be limited to 3000 or so.
>>>
>>> It probably won't get to .2% accuracy, but you might have better results 
>>>
>>> ShreeDevi
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Tue, Sep 19, 2017 at 2:00 PM, <[email protected]> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am training my own traineddata model for the chi_sim language with 
>>>> the finetune training. In my trained data, there are some mathematical 
>>>> symbols, such as "∞", "β", "△" and so on, which cannot be recognized in 
>>>> the 
>>>> official chi_sim.traineddata model.
>>>>
>>>> So we change the content of the chi_sim.training_text file, and fill 
>>>> the file with our training data.
>>>>
>>>>
>>>> Then executing the training command:
>>>> training/lstmtraining --model_output 
>>>> ~/tesstutorial/trainspecial/special \
>>>>   --continue_from ~/tesstutorial/trainspecial/chi_sim.lstm \
>>>>   --traineddata ~/tesstutorial/trainspecial/chi_sim/chi_sim.traineddata 
>>>> \
>>>>   --old_traineddata tessdata/best/chi_sim.traineddata \
>>>>   --train_listfile 
>>>> ~/tesstutorial/trainspecial/chi_sim.training_files.txt \
>>>>   --max_iterations 400000
>>>>
>>>> As the command, when we iterate 400000 times, the char error is about 
>>>> 0.2% and the word error is about 4.2%. 
>>>> The error rate has almost started to oscillate and it can't go down. So 
>>>> we stopped training and exported the traineddata model.
>>>>
>>>> After testing the exported traineddata model, the accuracy is not 
>>>> satisfactory enough, which is lower than the model provided by the 
>>>> official 
>>>> website (tesseract github website).
>>>>
>>>> We hope that the training model recognition accuracy will be consistent 
>>>> with the official website. Then how can we continue to further improve the 
>>>> accuracy of the model?
>>>>
>>>> Does anyone know the details of the official website training language 
>>>> model, such as the num of iteration, the lowest char error and word error, 
>>>> the value of the learning_rate, and so on?
>>>>
>>>> If you know these information, please give some tips.
>>>>
>>>>
>>>> Thank you.
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/a9a25aeb-2182-41d5-9a69-aef34a92eb27%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a9a25aeb-2182-41d5-9a69-aef34a92eb27%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/54f6a114-54c3-462b-a6f0-11d6ca81f6c4%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/54f6a114-54c3-462b-a6f0-11d6ca81f6c4%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/13d485ee-a59b-487c-adfe-efc3af123855%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to