Okay, I will do as you suggested. Thank you for answering my question.
2019년 10월 10일 목요일 오후 8시 23분 9초 UTC+9, shree 님의 말: > > I suggest that you open issue in tesstrain repo. > > The makefile does training from scratch. Is that what you wanted? Do you > have a large enough training text - how many lines? How many iterations for > training? > > Eval Char error rate=133.33333, Word error rate=96.875 > > That is a very high error rate. You need to get it down to 0%. > > On Thu, Oct 10, 2019 at 11:26 AM J L <[email protected] <javascript:>> > wrote: > >> My system info: >> - OS: Ubuntu Desktop 18.04 LTS (4.15.0-55-generic) >> >> >> Hi. >> >> I am beginner and am trying to train some Korean character images for >> Korean recognition. >> >> To understand how to train with Tesseract 4.0 LSTM, I followed Tesstrain. >> >> I followed lines of Makefile in the Tesstrain step by step, and most of >> steps seemed to work fine until creating traineddata. >> >> >> *In detail:* >> >> 1. I made box files and unicharset by following this lines >> <https://github.com/tesseract-ocr/tesstrain/blob/master/Makefile#L128-L138> >> . >> >> 2. I made lstmf files by following this lines >> <https://github.com/tesseract-ocr/tesstrain/blob/master/Makefile#L140-L145> >> . >> >> 3. I made two split file lists for training and evaluation by following this >> lines >> <https://github.com/tesseract-ocr/tesstrain/blob/master/Makefile#L115-L116> >> . >> >> 4. Before combining lang model, I downloaded radical-stroke.txt by >> following this line >> <https://github.com/tesseract-ocr/tesstrain/blob/master/Makefile#L191>, >> and 3 langdata files (kor.punc, kor.numbers, and kor.wordlist) from this >> link <https://github.com/tesseract-ocr/langdata_lstm/tree/master/kor>. >> >> I didn't download kor.config file because it cause an error that >> chi_tra.traineddata is needed. >> >> 5. I combined lang model by following this lines >> <https://github.com/tesseract-ocr/tesstrain/blob/cf7854cbf2a07013fc3df2bbaddebf719534b27b/Makefile#L255-L264> >> . >> >> 6. Then I started LSTM training by following this lines >> <https://github.com/tesseract-ocr/tesstrain/blob/master/Makefile#L173-L180> >> . >> >> 7. I tested them. The results are like: >> lim@ubuntu:~/tools/tesstrain$ usr/bin/lstmeval --traineddata >> data/kor/kor.traineddata --model data/kor/checkpoints/kor_checkpoint >> --eval_listfile data/kor/list.eval >> data/kor/checkpoints/kor_checkpoint is not a recognition model, trying >> training checkpoint... >> Loaded 1/1 lines (1-1) of document >> data/ground-truth/kor.malgun.exp249.lstmf >> Loaded 1/1 lines (1-1) of document >> data/ground-truth/kor.malgun.exp228.lstmf >> Truth:먹 >> OCR :이 >> Truth:독 >> OCR :이 >> Loaded 1/1 lines (1-1) of document >> data/ground-truth/kor.malgun.exp197.lstmf >> Loaded 1/1 lines (1-1) of document >> data/ground-truth/kor.malgun.exp41.lstmf >> Truth:파 >> OCR :이 >> Truth:신 >> OCR :열 >> ... (skip) >> At iteration 0, stage 0, Eval Char error rate=133.33333, Word error >> rate=96.875 >> >> There seems to be no problem with the results. >> >> 8. I made traineddata output file. >> lim@ubuntu:~/tools/tesstrain$ usr/bin/lstmtraining --stop_training \ >> --continue_from data/kor/checkpoints/kor_checkpoint \ >> --traineddata data/kor/kor.traineddata \ >> --model_output usr/share/tessdata/kor.traineddata >> >> 9. Then I used tesseract with kor.malgun.exp197.tif. the TIF file was >> shown to *'이'* when I followed step 7 (testing with lstmeval). So I >> expected the same result. >> lim@ubuntu:~/tools/tesstrain$ usr/bin/tesseract >> data/ground-truth/kor.malgun.exp197.tif stdout -l kor --psm 6 > result >> >> But the real result was totally mess. It's the result: >> >> [image: res.JPG] >> >> >> >> Why the results of `lstmeval` and `tesseract` are different? >> >> Thank you... >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/074b17ee-cb7c-49a2-a653-1180f6190254%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/074b17ee-cb7c-49a2-a653-1180f6190254%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9dc69f72-d99c-4dec-b14c-2b93f5824acb%40googlegroups.com.

