Hi, Shree, I have also tried the new traineddata to recognize the simplified Chinese with the Linux system (ubuntu), and it works. but it seems that the new traineddata dosen't support in the windows.
For the new traineddata in the ubuntu, there is also some special symbols cannot be recognized, such as, '∠', '△', '≌', '≥' and so on. And, I will improve these special symbols' recognition. But there is no good way to implement it now. Can you give me some advice? Thanks. 在 2017年8月1日星期二 UTC+8下午4:45:07,shree写道: > > Ray has uploaded new traineddata files in > https://github.com/tesseract-ocr/tessdata/tree/master/best > > Why don't you first try recognition with that > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Tue, Aug 1, 2017 at 1:45 PM, <[email protected] <javascript:>> > wrote: > >> Hello, Shree: >> >> I'm sorry, but whether can I use more than one unicharset, such as >> chi_sim and eng and so on, to finetune the training? >> Maybe some special characters can be in other unicharsets. If I find >> it/them, maybe I will train my traineddata with more unicharsets, and the >> special characters will be encoded at that time. >> >> Thanks, and hope for your reply. >> >> 在 2017年7月25日星期二 UTC+8下午3:23:08,shree写道: >>> >>> That error is because some characters in your training text are not part >>> of the unicharset of chi_sim. >>> >>> You are trying finetune training which will give error. Replace top >>> layer will work. >>> >>> I suggest that you wait 2-3 weeks for Ray to upload new traineddata for >>> all languages. >>> >>> You can tell us if there are any specific characters missing from >>> existing traineddata . >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Tue, Jul 25, 2017 at 12:46 PM, <[email protected]> wrote: >>> >>>> Hello, >>>> >>>> I apply the command to train my own traineddata: >>>> >>>> lstmtraining --model_output ~/tesstutorial/chituned_from_chisim/chituned \ >>>> --continue_from ~/tesstutorial/chituned_from_chisim/chi_sim.lstm \ >>>> --train_listfile ~/tesstutorial/chitest/chi.training_files.txt \ >>>> --eval_listfile ~/tesstutorial/chitest/chi.training_files.txt \ >>>> --target_error_rate 0.01 >>>> >>>> An error appears by Tess4.0 that shown in the following img. The system >>>> (Tess4.0) says "Can't encode transcript" for text content such as >>>> "化简(-x2)3的结果是...". >>>> Why? Can you help me? >>>> >>>> >>>> <https://lh3.googleusercontent.com/-f5tjdv3_nvk/WXbvefZQYrI/AAAAAAAAAAM/COSWa-ewxy46XNkFxUCUl5V2r4K2ZfiQACLcBGAs/s1600/_%2524_WUP8_FXB%2560DR9_I5A8Y%2560L.png> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/e2e1d749-a55d-4355-b128-5d0fe2181e19%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/e2e1d749-a55d-4355-b128-5d0fe2181e19%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/2753f88a-ba89-4164-8271-9eb13207736f%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/2753f88a-ba89-4164-8271-9eb13207736f%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1985a9ff-316f-4e98-bcc6-58880214ab82%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

