Re: [tesseract-ocr] The Accuracy improvement of training the chi_sim.traineddata model

ShreeDevi Kumar Tue, 19 Sep 2017 01:50:11 -0700

As per comments by Ray, for finetune or for plus minus a few letters.
the number of iterations should be limited to 3000 or so.


It probably won't get to .2% accuracy, but you might have better results

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Sep 19, 2017 at 2:00 PM, <[email protected]> wrote:

> Hello,
>
> I am training my own traineddata model for the chi_sim language with the
> finetune training. In my trained data, there are some mathematical symbols,
> such as "∞", "β", "△" and so on, which cannot be recognized in the official
> chi_sim.traineddata model.
>
> So we change the content of the chi_sim.training_text file, and fill the
> file with our training data.
>
>
> Then executing the training command:
> training/lstmtraining --model_output ~/tesstutorial/trainspecial/special \
>   --continue_from ~/tesstutorial/trainspecial/chi_sim.lstm \
>   --traineddata ~/tesstutorial/trainspecial/chi_sim/chi_sim.traineddata \
>   --old_traineddata tessdata/best/chi_sim.traineddata \
>   --train_listfile ~/tesstutorial/trainspecial/chi_sim.training_files.txt
> \
>   --max_iterations 400000
>
> As the command, when we iterate 400000 times, the char error is about 0.2%
> and the word error is about 4.2%.
> The error rate has almost started to oscillate and it can't go down. So we
> stopped training and exported the traineddata model.
>
> After testing the exported traineddata model, the accuracy is not
> satisfactory enough, which is lower than the model provided by the official
> website (tesseract github website).
>
> We hope that the training model recognition accuracy will be consistent
> with the official website. Then how can we continue to further improve the
> accuracy of the model?
>
> Does anyone know the details of the official website training language
> model, such as the num of iteration, the lowest char error and word error,
> the value of the learning_rate, and so on?
>
> If you know these information, please give some tips.
>
>
> Thank you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/a9a25aeb-2182-41d5-9a69-aef34a92eb27%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/a9a25aeb-2182-41d5-9a69-aef34a92eb27%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUin6cHZK3QZ%3DPXt4EB6cpP%2B99GqxTWtAB_-7wx_JOOOw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] The Accuracy improvement of training the chi_sim.traineddata model

Reply via email to